MBA Core · Module 03

Statistics & Analytics

How to turn messy data into confident decisions. The goal isn't math for its own sake — it's answering "is this real, or just noise?" so you can bet the business on the right number.

1The big idea

Signal vs. noise

Every dataset has two parts: the real underlying signal and random noise. Statistics is the toolkit for separating them — so you don't mistake a lucky streak for a real effect.

DescriptiveSummarise data you have: averages, spread, charts. "What happened?"
InferentialDraw conclusions about a whole population from a sample. "What's likely true in general?"
Two key wordsPopulation = everyone/everything you care about. Sample = the subset you actually measure. We use Greek letters (μ, σ) for population, Latin (x̄, s) for sample.
Memory hook 🧠Descriptive describes, inferential infers. One looks back at your data; the other reaches beyond it.
2Measures of center

Mean, Median & Mode

Three ways to answer "what's typical?" — and they disagree when data is skewed, which is the whole point.

Mean (x̄)The average. Sum ÷ count. Sensitive to outliers — one billionaire wrecks an average income.
MedianThe middle value when sorted. Robust to outliers — better for skewed data like income or house prices.
ModeThe most frequent value. Useful for categories ("most common plan sold").
The mean
x̄ = Σ xᵢn   =   (sum of all values) ÷ (how many)
Mode Median Mean long tail →
In a right-skewed distribution the tail drags the mean rightward, past the median and mode.
Memory hook 🧠Skew direction = where the tail points. The mean chases the tail; the median sits calmly in the middle.
3Measures of spread

Variance & Standard Deviation

Center isn't enough — you need to know how spread out the data is. Two teams can average the same sales but one is wildly inconsistent. That risk lives in the spread.

Variance & standard deviation
σ² = Σ (xᵢ − μ)²N     σ = √σ²
Variance = average squared distance from the mean. Standard deviation (σ) = its square root, back in the original units. Bigger σ = more spread = more risk.
Why square it?Distances above and below the mean would cancel to zero. Squaring makes them all positive (and punishes big deviations more). The square root then returns us to readable units.
Memory hook 🧠Standard deviation = the "typical distance" from average. Small σ = tightly clustered. Large σ = scattered.
4The most important shape

The Normal Distribution

Nature's favourite shape — heights, test scores, measurement errors all cluster into this symmetric "bell." Once data is normal, the 68–95–99.7 rule tells you exactly where values fall.

μ −1σ+1σ −2σ+2σ −3σ+3σ 68% 95% 99.7%
68% of values fall within ±1σ of the mean · 95% within ±2σ · 99.7% within ±3σ.
The z-score — your universal ruler
z = x − μσ
"How many standard deviations is this value from the mean?" Converts any normal data to a common scale so you can compare apples to oranges.
Memory hook 🧠"68, 95, 99.7" — chant it. Almost everything (95%) lives within 2σ; anything beyond 3σ is genuinely rare (0.3%).
5Quantifying uncertainty

Probability Basics

Probability is just a number from 0 (impossible) to 1 (certain) describing how likely something is. A few rules cover most business cases:

AND (independent)P(A and B) = P(A) × P(B). Both must happen → multiply.
OR (exclusive)P(A or B) = P(A) + P(B). Either one → add (subtract the overlap if they can both occur).
ConditionalP(A | B) = probability of A given B already happened. The basis of Bayesian updating.
ComplementP(not A) = 1 − P(A). Often the easy way in ("at least one" = 1 − "none").
Memory hook 🧠AND → multiply (narrows odds), OR → add (widens odds). "And" makes things rarer; "or" makes them likelier.
6Why samples work

Sampling & the Central Limit Theorem

You can't measure everyone, so you take a sample. The Central Limit Theorem (CLT) is the near-magic result that makes inference possible:

Central Limit TheoremIf you take many samples and plot their means, that distribution of means is approximately Normaleven if the original data isn't — as long as the sample is big enough (n ≥ ~30).
Standard error — spread of the sample mean
SE = σ√n
As sample size n grows, SE shrinks → your estimate of the mean gets tighter and more reliable. Quadruple n to halve the error.
Memory hook 🧠Averages are calmer than individuals. Single data points are wild; their averages bunch into a tidy bell. Bigger n = calmer still.
7A range, not a guess

Confidence Intervals

A single estimate ("avg spend is $52") hides its uncertainty. A confidence interval gives an honest range: "we're 95% confident the true average is between $48 and $56."

95% confidence interval for a mean
CI = x̄  ±  z · σ√n
x̄ = sample mean · z = 1.96 for 95% confidence · the second term is the margin of error. Higher confidence → wider interval.
What 95% really meansNot "95% chance the truth is in this one interval." It means: if you repeated the study many times, 95% of the intervals you'd build would contain the true value.
Memory hook 🧠Want more confidence? Pay for it with a wider net. Want a tighter net? Collect more data (bigger n shrinks the margin).
8Is the effect real?

Hypothesis Testing

The core engine of "is this real or just luck?" You assume there's no effect, then check whether your data is too surprising for that to be true.

1
State H₀ & H₁. Null (H₀): "no effect / no difference." Alternative (H₁): "there is an effect."
2
Pick α (significance level), usually 0.05 — your tolerance for a false alarm.
3
Compute the p-value: the probability of seeing data this extreme if H₀ were true.
4
Decide. If p < α → reject H₀ ("statistically significant"). If p ≥ α → fail to reject H₀.
reject H₀ reject H₀ fail to reject H₀ (plausible under null) α/2 α/2
Land in the pink tails (very unlikely under the null) → reject H₀. The tails together hold α (e.g. 5%).

The two error types

H₀ actually TRUE
H₀ actually FALSE
You reject H₀
Type I errorFalse alarm (α). You "found" an effect that isn't real.
Correct ✓You caught a real effect (power).
You keep H₀
Correct ✓Rightly found nothing.
Type II errorMissed it (β). A real effect slipped past you.
Memory hook 🧠Type I = "false positive" (crying wolf). Type II = "false negative" (missing the wolf). One sees ghosts; the other goes blind.
Watch out"Fail to reject H₀" ≠ "H₀ is true." Absence of evidence isn't evidence of absence — you just didn't find enough to overturn the default.
9Do two things move together?

Correlation

The correlation coefficient r measures how strongly two variables move together, on a scale from −1 to +1.

r ≈ +1
positive
r ≈ 0
none
r ≈ −1
negative
The cardinal ruleCorrelation ≠ causation. Ice cream sales correlate with drownings — because both rise in summer (heat), not because ice cream causes drowning. Always ask: is there a hidden third variable?
10Predict & quantify

Linear Regression

Correlation says "they move together"; regression draws the actual line of best fit so you can predict one variable from another and quantify the relationship.

Y X ŷ = a + bx residual
The line minimises the total squared vertical gaps (residuals) between points and line — "least squares."
The regression line
ŷ = a + b·x
a = intercept (Y when X=0) · b = slope (how much Y changes per 1-unit rise in X). R² tells you what % of Y's variation the model explains (0–1, higher = better fit).
R² in one lineR² = % of the variation explained. R² = 0.8 means 80% of the ups and downs in Y are accounted for by X; the other 20% is noise or missing factors.
Tie it together 🧠r tells you how tight the cloud is; b (slope) tells you how steep; tells you how much the line explains. Three different questions, three different numbers.

🎯 Active recall

Cover the answer, say it aloud, then tap to check. Re-draw the bell curve (with 68-95-99.7) and a regression scatter from memory. Revisit today, +3 days, +1 week.

Mean vs median — which is robust to outliers, and why use it?
The median. It's the middle value, so extreme outliers don't drag it. Use it for skewed data like income or house prices.
tap to reveal
What does standard deviation tell you in plain words?
The typical distance of a data point from the mean. Small σ = tightly clustered (low risk); large σ = scattered (high risk).
tap to reveal
State the 68-95-99.7 rule.
For normal data: ~68% of values fall within ±1σ of the mean, ~95% within ±2σ, ~99.7% within ±3σ.
tap to reveal
What's a z-score and its formula?
How many standard deviations a value sits from the mean. z = (x − μ) / σ. Lets you compare values from different scales.
tap to reveal
In one sentence, what does the Central Limit Theorem say?
The distribution of sample means is approximately normal even if the underlying data isn't — provided the sample is large enough (n ≥ ~30).
tap to reveal
What is a p-value?
The probability of observing data at least as extreme as yours IF the null hypothesis were true. Small p (< α, usually 0.05) → reject the null.
tap to reveal
Type I vs Type II error?
Type I = false positive: rejecting a true null (crying wolf, rate = α). Type II = false negative: failing to reject a false null (missing a real effect, rate = β).
tap to reveal
Why is "correlation ≠ causation"? Give the classic example.
Two variables can move together because of a hidden third factor. Ice cream sales & drownings both rise with summer heat — neither causes the other.
tap to reveal
What does R² = 0.7 mean?
The regression model explains 70% of the variation in Y; the remaining 30% is unexplained (noise or missing variables).
tap to reveal
Module 3 of your MBA · Re-draw the bell curve and a scatter+line from memory before moving on. 📊