Appendix C — Formula Sheet and Exam Preparation

Status: ported 2026-05-19. Reviewed by editor: pending.

NoteHow to use this guide

Each topic has (1) a formula cheat sheet, (2) a common mistakes alert, (3) a quick decision rule, and (4) one fully solved exam-style exercise in R. Try the exercise on paper before unfolding the worked solution. Print this page for an offline reference.

C.1 C.1 — Topic 1: Univariate descriptive statistics

C.1.1 Key formulas

Measure Formula When to use
Arithmetic mean \(\bar{x} = \frac{1}{n}\sum x_i n_i\) Symmetric data, no outliers
Median \(Me = L_i + \frac{n/2 - N_{i-1}}{n_i} \cdot a_i\) Skewed data, outliers
Mode Interval with highest \(h_i = n_i / a_i\) Categorical / grouped data
Variance (\(S^2\), divisor \(n\)) \(S^2 = \frac{\sum x_i^2 n_i}{n} - \bar{x}^2\) Always (primary dispersion)
Coefficient of variation \(CV = S / \bar{x}\) Comparing across scales
Skewness \(g_1 = m_3 / S^3\) Detect asymmetry
Kurtosis \(g_2 = m_4 / S^4 - 3\) Tail heaviness
Gini index \(G = 1 - \sum(p_i - p_{i-1})(q_i + q_{i-1})\) Inequality
Linear transform If \(Y = a + bX\): \(\bar{y} = a + b\bar{x}\), \(S_Y = \lvert b \rvert\, S_X\) Currency, unit change

C.1.2 Decision rule

Which central-tendency measure? Nominal \(\to\) Mode. Ordinal \(\to\) Median. Ratio/interval, symmetric \(\to\) Mean. Skewed or outliers \(\to\) Median.

WarningCommon mistakes (Topic 1)
  1. Using frequency as histogram height with unequal intervals. Use density \(h_i = n_i / a_i\).
  2. Forgetting class marks (\(c_i\), midpoints) for grouped data.
  3. Comparing variances across different scales. Use \(CV\) instead.
  4. Computing Gini without sorting values from smallest to largest first.
  5. Confusing \(S^2\) (divisor \(n\)) with \(\hat{\sigma}^2\) (divisor \(n-1\)). This course uses \(S^2 = \frac{1}{n}\sum(x_i - \bar{x})^2\).

C.1.3 Worked exam exercise — Monthly electricity bills

Monthly electricity bills (EUR) for 50 households are grouped into intervals \([30,50), [50,70), [70,90), [90,110), [110,130)\) with frequencies \(n_i = 6, 14, 18, 8, 4\). Compute the mean, variance, \(CV\), median, and skewness. Plot the density-scale histogram.

Code
intervals <- c(30, 50, 70, 90, 110, 130)
ni <- c(6, 14, 18, 8, 4)
n  <- sum(ni)
ci <- (intervals[-length(intervals)] + intervals[-1]) / 2   # class marks
ai <- diff(intervals)                                       # widths

# Mean
x_bar <- sum(ci * ni) / n
cat("Mean =", round(x_bar, 2), "EUR\n")
#> Mean = 76 EUR
Code
# Variance and SD
S2 <- sum(ci^2 * ni) / n - x_bar^2
S  <- sqrt(S2)
cat("Variance =", round(S2, 2), ", SD =", round(S, 2), "\n")
#> Variance = 480 , SD = 21.91
Code
# CV
CV <- S / x_bar
cat("CV =", round(CV, 4), "(", round(CV * 100, 1), "%)\n")
#> CV = 0.2883 ( 28.8 %)
Code
# Median
Ni <- cumsum(ni)
med_idx <- which(Ni >= n / 2)[1]
Me <- intervals[med_idx] + (n / 2 - c(0, Ni)[med_idx]) / ni[med_idx] * ai[med_idx]
cat("Median =", round(Me, 2), "EUR\n")
#> Median = 75.56 EUR
Code
# Skewness
m3 <- sum(ni * (ci - x_bar)^3) / n
g1 <- m3 / S^3
cat("Skewness g1 =", round(g1, 3),
    ifelse(g1 > 0, "(right-skewed)",
           ifelse(g1 < 0, "(left-skewed)", "(symmetric)")), "\n")
#> Skewness g1 = 0.219 (right-skewed)
Code
# Histogram (density scale)
barplot(ni / ai,
        names.arg = paste0("[", intervals[-6], ",", intervals[-1], ")"),
        col = "steelblue", ylab = "Density", xlab = "Electricity bill (EUR)",
        main = "Histogram (density scale)")

Interpretation. The distribution is slightly right-skewed (\(g_1 > 0\)), so the mean exceeds the median. A \(CV\) of about 30% indicates moderate dispersion.

C.2 C.2 — Topic 2: Bivariate descriptive statistics

C.2.1 Key formulas

Measure Formula
Covariance \(S_{XY} = \overline{xy} - \bar{x}\bar{y}\)
Correlation \(r = S_{XY} / (S_X \cdot S_Y)\), always in \([-1, 1]\)
OLS slope \(b = S_{XY} / S_X^2\)
OLS intercept \(a = \bar{y} - b\bar{x}\)
Coefficient of determination \(R^2 = r^2\), fraction of variance explained
Statistical independence \(f_{ij} = f_{i\cdot} \cdot f_{\cdot j}\) for all \(i, j\)
Exponential fit \(\log y = \log a + x \log b\); OLS on \((x, \log y)\)

C.2.2 Decision rule

Linear or nonlinear? Plot first. Straight pattern \(\to\) linear OLS. Curved and accelerating \(\to\) exponential (\(\log y\)). Curved and decelerating \(\to\) power (\(\log x, \log y\)). U-shape \(\to\) quadratic.

WarningCommon mistakes (Topic 2)
  1. Confusing correlation with causation. A high \(r\) does not prove \(X\) causes \(Y\).
  2. Extrapolating beyond the data range. The line is only meaningful within observed \(x\) values.
  3. Only checking one cell for independence. You must verify \(f_{ij} = f_{i\cdot} f_{\cdot j}\) for all cells. One failure \(\Rightarrow\) dependent.
  4. Reporting \(R^2\) on log-transformed variables as if it applied to the original scale.
  5. Forgetting that \(b\) changes if you swap \(X\) and \(Y\). The regression of \(Y\) on \(X\) is not the same as \(X\) on \(Y\).

C.2.3 Worked exam exercise — Study hours vs exam score

For ten students with study hours \(x = (2,3,4,5,6,7,8,9,10,12)\) and scores \(y = (45,52,58,61,70,72,78,82,88,95)\), compute \(S_{XY}\), \(S_X^2\), \(S_Y^2\), the OLS line \(y = a + bx\), the correlation \(r\), \(R^2\), and predict the score for 6.5 hours.

Code
hours <- c(2, 3, 4, 5, 6, 7, 8, 9, 10, 12)
score <- c(45, 52, 58, 61, 70, 72, 78, 82, 88, 95)

# Scatter plot
plot(hours, score, pch = 19, col = "steelblue", cex = 1.3,
     xlab = "Study hours", ylab = "Exam score",
     main = "Study hours vs exam score")

# By-hand computation
n     <- length(hours)
x_bar <- mean(hours)
y_bar <- mean(score)
Sxy   <- mean(hours * score) - x_bar * y_bar
Sx2   <- mean(hours^2) - x_bar^2
Sy2   <- mean(score^2) - y_bar^2

b  <- Sxy / Sx2
a  <- y_bar - b * x_bar
r  <- Sxy / (sqrt(Sx2) * sqrt(Sy2))
R2 <- r^2

cat("x_bar =", x_bar, ", y_bar =", y_bar, "\n")
#> x_bar = 6.6 , y_bar = 70.1
Code
cat("Sxy =", round(Sxy, 3), ", Sx2 =", round(Sx2, 3), "\n")
#> Sxy = 46.24 , Sx2 = 9.24
Code
cat("Slope b =", round(b, 3), "\n")
#> Slope b = 5.004
Code
cat("Intercept a =", round(a, 3), "\n")
#> Intercept a = 37.071
Code
cat("r =", round(r, 4), ", R^2 =", round(R2, 4), "\n")
#> r = 0.9955 , R^2 = 0.991
Code
abline(a, b, col = "red", lwd = 2)
legend("topleft",
       legend = paste0("y = ", round(a, 1), " + ", round(b, 2), "x"),
       col = "red", lwd = 2, bty = "n")

Code
# Verify with lm()
fit <- lm(score ~ hours)
cat("\nlm() verification:\n"); print(coef(fit))
#> 
#> lm() verification:
#> (Intercept)       hours 
#>   37.071429    5.004329
Code
cat("\nPredicted score for 6.5 hours:", round(a + b * 6.5, 1), "\n")
#> 
#> Predicted score for 6.5 hours: 69.6

Interpretation. Each additional study hour increases the expected score by roughly 5 points. With \(R^2 \approx 0.99\), study hours explain about 99% of score variability.

C.3 C.3 — Topic 3: Introduction to probability

C.3.1 Key formulas

Rule Formula
Complement \(P(\bar{A}) = 1 - P(A)\)
Addition \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Conditional \(P(A \mid B) = P(A \cap B) / P(B)\)
Product \(P(A \cap B) = P(A \mid B) \cdot P(B)\)
Independence \(P(A \cap B) = P(A) \cdot P(B)\)
Total probability \(P(B) = \sum_i P(B \mid A_i) P(A_i)\)
Bayes’ theorem \(P(A_i \mid B) = \dfrac{P(B \mid A_i) P(A_i)}{\sum_j P(B \mid A_j) P(A_j)}\)

C.3.2 Decision rule

Which formula? Single event \(\to\) complement or Laplace. Two events \(\to\) check independent vs mutually exclusive. Sequential events \(\to\) tree diagram with the product rule. “Given that…” \(\to\) conditional probability. “What caused the observed outcome?” \(\to\) Bayes’ theorem.

WarningCommon mistakes (Topic 3)
  1. Confusing independent and mutually exclusive. Mutually exclusive events (\(A \cap B = \emptyset\)) are dependent — knowing \(A\) occurred tells you \(B\) did not.
  2. Applying Laplace’s rule when outcomes are NOT equally likely (loaded dice, biased coins).
  3. Ignoring base rates (the base-rate fallacy). A 99%-accurate test on a rare disease still produces mostly false positives.
  4. Multiplying probabilities without checking independence. \(P(A \cap B) = P(A) P(B)\) holds only under independence.
  5. Forgetting that posteriors must sum to 1 after applying Bayes’ theorem.

C.3.3 Worked exam exercise — Bayes’ theorem (two factories)

Factory A supplies 60% of stock with a 3% defect rate; Factory B supplies 40% with a 7% defect rate. A defective item is found. Compute \(P(\text{Defective})\), \(P(A \mid D)\) and \(P(B \mid D)\), and verify by simulation.

Code
P_A   <- 0.60;  P_B   <- 0.40
P_D_A <- 0.03;  P_D_B <- 0.07

# Total probability of defect
P_D <- P_D_A * P_A + P_D_B * P_B
cat("P(D) =", P_D, "\n")
#> P(D) = 0.046
Code
# Bayes
P_A_D <- (P_D_A * P_A) / P_D
P_B_D <- (P_D_B * P_B) / P_D
cat("P(A|D) =", round(P_A_D, 4), "\n")
#> P(A|D) = 0.3913
Code
cat("P(B|D) =", round(P_B_D, 4), "\n")
#> P(B|D) = 0.6087
Code
cat("Check: sum =", P_A_D + P_B_D, "\n")
#> Check: sum = 1
Code
# Monte-Carlo check
n_sim <- 100000
factory <- sample(c("A", "B"), n_sim, replace = TRUE, prob = c(0.60, 0.40))
defect  <- ifelse(factory == "A",
                  rbinom(n_sim, 1, 0.03),
                  rbinom(n_sim, 1, 0.07))
defectives <- factory[defect == 1]
cat("\nSimulation (", n_sim, "items):\n")
#> 
#> Simulation ( 100000 items):
Code
cat("P(A|D) simulated =", round(mean(defectives == "A"), 4), "\n")
#> P(A|D) simulated = 0.3957
Code
cat("P(B|D) simulated =", round(mean(defectives == "B"), 4), "\n")
#> P(B|D) simulated = 0.6043

Interpretation. Although Factory A supplies 60% of stock, it is responsible for only about 39.1% of defectives. Factory B, with its higher defect rate, is the more likely source.

C.4 C.4 — Topic 4: Random variables

C.4.1 Key formulas

Concept Discrete Continuous
Mean \(\mu = \sum x_i p_i\) \(\mu = \int x f(x)\, dx\)
Variance \(\sigma^2 = \sum x_i^2 p_i - \mu^2\) \(\sigma^2 = \int x^2 f(x)\, dx - \mu^2\)
CDF \(F(x) = P(X \le x) = \sum_{x_i \le x} p_i\) \(F(x) = \int_{-\infty}^x f(t)\, dt\)
\(P(a < X \le b)\) \(F(b) - F(a)\) \(\int_a^b f(x)\, dx\)
Linear transform \(\mathbb{E}[aX + b] = a\mu + b\) \(\operatorname{Var}(aX + b) = a^2 \sigma^2\)
Covariance \(\sigma_{XY} = \mathbb{E}[XY] - \mu_X \mu_Y\) Independence \(\Rightarrow \rho = 0\) (not vice versa)

C.4.2 Decision rule

Discrete or continuous? Countable values (0, 1, 2, …) \(\to\) discrete: use sums and PMF \(p_i\). Uncountable values (an interval) \(\to\) continuous: use integrals and density \(f(x)\). For \(\operatorname{Var}(X)\) always use \(\mathbb{E}[X^2] - \mu^2\); for any linear function use \(\mathbb{E}[aX + b] = a\mu + b\), \(\operatorname{Var}(aX + b) = a^2 \sigma^2\).

WarningCommon mistakes (Topic 4)
  1. Forgetting that \(P(X = a) = 0\) for continuous RVs. Probability is area under \(f\), not a height.
  2. Writing \(\mathbb{E}[X^2] = (\mathbb{E}[X])^2\). They are not equal; \(\operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \ge 0\).
  3. Assuming zero covariance implies independence. \(\rho = 0\) only rules out linear dependence.
  4. Forgetting to check that \(\sum p_i = 1\) (or \(\int f(x)\, dx = 1\)) before computing expectations.
  5. Subtracting variances when computing \(\operatorname{Var}(X - Y)\). The correct formula is \(\operatorname{Var}(X) + \operatorname{Var}(Y) - 2\operatorname{Cov}(X, Y)\).

C.4.3 Worked exam exercise — A discrete PMF

Let \(X\) be daily product demand with PMF \(P(X = x)\) given by \(p = (0.10, 0.25, 0.35, 0.20, 0.10)\) on \(x = 0, 1, 2, 3, 4\). Verify the PMF, compute \(\mathbb{E}[X]\), \(\mathbb{E}[X^2]\), \(\operatorname{Var}(X)\), the CDF, and \(P(1 \le X \le 3)\).

Code
x <- c(0, 1, 2, 3, 4)
p <- c(0.10, 0.25, 0.35, 0.20, 0.10)

cat("Sum of probabilities:", sum(p), "\n")
#> Sum of probabilities: 1
Code
EX   <- sum(x * p)
EX2  <- sum(x^2 * p)
VarX <- EX2 - EX^2
cat("E[X]   =", EX, "\n")
#> E[X]   = 1.95
Code
cat("E[X^2] =", EX2, "\n")
#> E[X^2] = 5.05
Code
cat("Var(X) =", VarX, ", SD =", round(sqrt(VarX), 3), "\n")
#> Var(X) = 1.2475 , SD = 1.117
Code
# CDF
Fx <- cumsum(p)
knitr::kable(
  data.frame(x = x, `P(X=x)` = p, `F(x)` = Fx, check.names = FALSE),
  digits = 2)
x P(X=x) F(x)
0 0.10 0.10
1 0.25 0.35
2 0.35 0.70
3 0.20 0.90
4 0.10 1.00
Code
cat("\nP(1 <= X <= 3) = F(3) - F(0) =", Fx[4] - Fx[1], "\n")
#> 
#> P(1 <= X <= 3) = F(3) - F(0) = 0.8
Code
par(mfrow = c(1, 2))
barplot(p, names.arg = x, col = "steelblue",
        ylab = "P(X = x)", xlab = "x", main = "PMF")
plot(stepfun(x, c(0, Fx)), pch = 19, col = "firebrick", lwd = 2,
     main = "CDF", xlab = "x", ylab = "F(x)",
     xlim = c(-1, 5), ylim = c(0, 1))

Code
par(mfrow = c(1, 1))

Interpretation. With \(\mathbb{E}[X] = 1.95\) and \(\operatorname{Var}(X) = 1.2475\), average daily demand is two units with standard deviation about 1.12.

C.5 C.5 — Topic 5: Discrete distributions

C.5.1 Key formulas

Distribution PMF \(\mu\) \(\sigma^2\) Use when
Binomial \(B(n, p)\) \(\binom{n}{k} p^k (1-p)^{n-k}\) \(np\) \(np(1-p)\) Fixed \(n\) trials, constant \(p\)
Poisson \(\mathcal{P}(\lambda)\) \(\dfrac{e^{-\lambda}\lambda^k}{k!}\) \(\lambda\) \(\lambda\) Counting events in an interval
Hypergeometric \(\dfrac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\) \(np\) \(np(1-p)\dfrac{N-n}{N-1}\) Sampling without replacement
Geometric (failures) \((1-p)^k p\) \(\dfrac{1-p}{p}\) \(\dfrac{1-p}{p^2}\) Trials until the first success

C.5.2 Decision rule

Three questions to identify the distribution:

  1. Counting events in a fixed interval? \(\to\) Poisson.
  2. Waiting for the first success? \(\to\) Geometric.
  3. Fixed \(n\) trials, count successes? With replacement \(\to\) Binomial. Without replacement \(\to\) Hypergeometric.
WarningCommon mistakes (Topic 5)
  1. Using Binomial when sampling WITHOUT replacement from a small population. Use Hypergeometric.
  2. Confusing \(P(X = k)\) with \(P(X \le k)\). “Exactly 3” vs “at most 3” vs “more than 3”.
  3. Forgetting the Poisson rate adjustment. If \(\lambda = 3\) per hour and you want \(P\) in two hours, use \(\lambda' = 6\).
  4. Two conventions for the Geometric. We count failures before the first success (\(k = 0, 1, 2, \ldots\)). Many textbooks count trials until success (\(k = 1, 2, 3, \ldots\)).
  5. Skipping the Poisson approximation conditions. Rule of thumb: \(n \ge 30\) and \(p \le 0.10\).

C.5.3 Worked exam exercise — Poisson and Binomial-Poisson approximation

A bakery receives an average of \(\lambda = 4\) complaints per week (Poisson). Compute \(P(X = 0)\), \(P(X \le 2)\), \(P(X > 6)\). Then check the Poisson approximation to \(B(200, 0.01)\) over \(k = 0, \ldots, 8\).

Code
lambda <- 4
cat("=== Poisson(lambda = 4) ===\n")
#> === Poisson(lambda = 4) ===
Code
cat("P(X = 0) =", round(dpois(0, lambda), 4), "\n")
#> P(X = 0) = 0.0183
Code
cat("P(X <= 2) =", round(ppois(2, lambda), 4), "\n")
#> P(X <= 2) = 0.2381
Code
cat("P(X > 6) =", round(1 - ppois(6, lambda), 4), "\n")
#> P(X > 6) = 0.1107
Code
n <- 200; p_burn <- 0.01
cat("\n=== Binomial B(200, 0.01) vs Poisson(2) ===\n")
#> 
#> === Binomial B(200, 0.01) vs Poisson(2) ===
Code
k        <- 0:8
binom_p  <- dbinom(k, n, p_burn)
pois_p   <- dpois(k, n * p_burn)
comparison <- data.frame(k = k,
                         Binomial   = round(binom_p, 5),
                         Poisson    = round(pois_p, 5),
                         Difference = round(binom_p - pois_p, 5))
knitr::kable(comparison)
k Binomial Poisson Difference
0 0.13398 0.13534 -0.00136
1 0.27067 0.27067 0.00000
2 0.27203 0.27067 0.00136
3 0.18136 0.18045 0.00091
4 0.09022 0.09022 0.00000
5 0.03572 0.03609 -0.00037
6 0.01173 0.01203 -0.00030
7 0.00328 0.00344 -0.00015
8 0.00080 0.00086 -0.00006
Code
barplot(rbind(binom_p, pois_p), beside = TRUE, names.arg = k,
        col = c("steelblue", "firebrick"),
        legend.text = c("Bin(200, 0.01)", "Pois(2)"),
        xlab = "k", ylab = "P(X = k)",
        main = "Binomial–Poisson approximation")

Interpretation. The approximation is excellent — the bars are visually indistinguishable, and the maximum difference is below 0.001.

C.6 C.6 — Topic 6: Index numbers

C.6.1 Key formulas

Index Formula Interpretation
Elementary \(I_{t/0} = \frac{Y_t}{Y_0} \times 100\) Single-variable change
Laspeyres (price) \(L_P = \frac{\sum p_t q_0}{\sum p_0 q_0} \times 100\) Cost of old basket at new prices
Paasche (price) \(P_P = \frac{\sum p_t q_t}{\sum p_0 q_t} \times 100\) Cost of current basket comparison
Fisher \(F = \sqrt{L_P \times P_P}\) Geometric-mean compromise
Laspeyres (quantity) \(L_Q = \frac{\sum q_t p_0}{\sum q_0 p_0} \times 100\) Real consumption change
Value decomposition \(V = L_P \cdot P_Q / 100 = P_P \cdot L_Q / 100\) Price + quantity effects
Deflation \(Y^{\text{real}} = Y^{\text{nominal}} / (\text{CPI}/100)\) Remove inflation
Linking \(I_{t/\text{new}} = I_{t/\text{old}} / I_{\text{new}/\text{old}} \times 100\) Change base period
Average growth rate \(\bar{T} = (Y_T/Y_0)^{1/T} - 1\) Geometric mean of factors

C.6.2 Decision rule

Which weights do I use? Price index with base-period quantities \(\to\) Laspeyres. Price index with current-period quantities \(\to\) Paasche. Need a symmetric compromise \(\to\) Fisher. To remove inflation, divide nominal by \(\text{CPI}/100\). To rebase to a new period, divide all values by \(I_{\text{new}/\text{old}}\) and multiply by 100.

WarningCommon mistakes (Topic 6)
  1. Averaging rates of variation arithmetically. Use the geometric mean of the factors, not the arithmetic mean of the rates.
  2. Dividing by the CPI instead of \(\text{CPI}/100\) when deflating. If CPI = 115, divide by 1.15.
  3. Confusing Laspeyres and Paasche weights. Laspeyres uses base-period quantities; Paasche uses current-period quantities.
  4. Forgetting to verify the value decomposition. Always check \(V = L_P \cdot P_Q / 100\).
  5. Applying the wrong linking formula. To convert old base to new: divide by \(I_{\text{new}/\text{old}}\) and multiply by 100.

C.6.3 Worked exam exercise — Three-good basket and deflation

Three goods (bread, milk, eggs) have base-period prices and quantities \(p_0 = (1.20, 0.90, 2.50)\), \(q_0 = (100, 200, 50)\) and current values \(p_t = (1.50, 1.10, 2.80)\), \(q_t = (90, 210, 45)\). Compute \(L_P\), \(P_P\), Fisher \(F\), \(L_Q\), \(P_Q\), the value index \(V\), and verify the decomposition. Then deflate a five-year nominal series with the given CPI.

Code
goods <- c("Bread (kg)", "Milk (L)", "Eggs (dozen)")
p0 <- c(1.20, 0.90, 2.50);  q0 <- c(100, 200, 50)
pt <- c(1.50, 1.10, 2.80);  qt <- c(90, 210, 45)

basket <- data.frame(Good = goods, p0, q0, pt, qt,
                     p0q0 = p0 * q0, ptq0 = pt * q0,
                     p0qt = p0 * qt, ptqt = pt * qt)
knitr::kable(basket, digits = 1)
Good p0 q0 pt qt p0q0 ptq0 p0qt ptqt
Bread (kg) 1.2 100 1.5 90 120 150 108.0 135
Milk (L) 0.9 200 1.1 210 180 220 189.0 231
Eggs (dozen) 2.5 50 2.8 45 125 140 112.5 126
Code
Lp <- sum(pt * q0) / sum(p0 * q0) * 100
Pp <- sum(pt * qt) / sum(p0 * qt) * 100
Fp <- sqrt(Lp * Pp)
Lq <- sum(qt * p0) / sum(q0 * p0) * 100
Pq <- sum(qt * pt) / sum(q0 * pt) * 100
V  <- sum(pt * qt) / sum(p0 * q0) * 100

cat("Laspeyres price =", round(Lp, 2), "\n")
#> Laspeyres price = 120
Code
cat("Paasche price   =", round(Pp, 2), "\n")
#> Paasche price   = 120.15
Code
cat("Fisher price    =", round(Fp, 2), "\n")
#> Fisher price    = 120.07
Code
cat("Laspeyres qty   =", round(Lq, 2), "\n")
#> Laspeyres qty   = 96.35
Code
cat("Paasche qty     =", round(Pq, 2), "\n")
#> Paasche qty     = 96.47
Code
cat("Value index     =", round(V, 2), "\n")
#> Value index     = 115.76
Code
cat("\nDecomposition check:\n")
#> 
#> Decomposition check:
Code
cat("  Lp * Pq / 100 =", round(Lp * Pq / 100, 2), "\n")
#>   Lp * Pq / 100 = 115.76
Code
cat("  Pp * Lq / 100 =", round(Pp * Lq / 100, 2), "\n")
#>   Pp * Lq / 100 = 115.76
Code
cat("  Value index   =", round(V, 2), "\n")
#>   Value index   = 115.76
Code
# Deflation example
cat("\n=== Deflation ===\n")
#> 
#> === Deflation ===
Code
years   <- 2020:2024
nominal <- c(1500, 1560, 1620, 1700, 1800)
cpi     <- c(100, 103, 108, 115, 121)
real    <- nominal / (cpi / 100)
knitr::kable(data.frame(Year = years, Nominal = nominal,
                        CPI = cpi, Real = round(real, 1)))
Year Nominal CPI Real
2020 1500 100 1500.0
2021 1560 103 1514.6
2022 1620 108 1500.0
2023 1700 115 1478.3
2024 1800 121 1487.6
Code
cat("Nominal growth:", round((1800 / 1500 - 1) * 100, 1), "%\n")
#> Nominal growth: 20 %
Code
cat("Real growth   :", round((real[5] / real[1] - 1) * 100, 1), "%\n")
#> Real growth   : -0.8 %

Interpretation. The value index decomposes exactly into either Laspeyres-price × Paasche-quantity or Paasche-price × Laspeyres-quantity (over 100). After deflating, real growth is markedly lower than nominal — most of the headline increase reflects inflation, not real gain.

C.7 C.7 — Topic 7: Time series

C.7.1 Key formulas

Step Multiplicative model Additive model
Model \(Y_t = T_t \cdot E_t \cdot \varepsilon_t\) \(Y_t = T_t + E_t + \varepsilon_t\)
OLS trend \(\hat{T}_t = a + b t\) same
Trend per season \(b / s\) same
Seasonal index \(\text{IVE}_i = \dfrac{\operatorname{avg}(Y_t / \hat{T}_t)}{\text{global avg}}\) \(E_i = \operatorname{avg}(Y_t - \hat{T}_t)\), normalised to sum to 0
Deseasonalised \(Y^* = Y_t / \text{IVE}_t\) \(Y^* = Y_t - E_t\)
Forecast \(\hat{Y}_t = \hat{T}_t \cdot \text{IVE}_t\) \(\hat{Y}_t = \hat{T}_t + E_t\)

C.7.2 Decision rule

Additive or multiplicative? Plot the data. Constant seasonal swings \(\to\) Additive. Swings grow with the trend \(\to\) Multiplicative.

WarningCommon mistakes (Topic 7)
  1. Forgetting to centre the MA(4). Even-order moving averages fall between time points and must be centred.
  2. Seasonal indices not normalised. They must sum to \(s\) (multiplicative) or to 0 (additive). If not, rescale.
  3. Confusing annual trend with per-season trend. If \(b\) is per year and the data are quarterly, trend per quarter \(= b/4\).
  4. Forecasting too far ahead. Univariate decomposition assumes the past pattern continues.
  5. Mixing additive IVEs with a multiplicative model (or vice versa). Constant amplitude \(\to\) additive; growing amplitude \(\to\) multiplicative.

C.7.3 Worked exam exercise — Full multiplicative decomposition

For quarterly sales over 4 years (2020–2023) given by \(Y_t = (38, 52, 65, 42; 41, 58, 72, 46; 45, 63, 80, 50; 48, 68, 88, 54)\), fit an OLS trend, compute the multiplicative seasonal indices \(\text{IVE}_i\) (Q1–Q4), deseasonalise, and forecast 2024.

Code
t_idx <- 1:16
sales <- c(38, 52, 65, 42,   41, 58, 72, 46,
           45, 63, 80, 50,   48, 68, 88, 54)

par(mfrow = c(2, 2))

# 1. Plot
plot(t_idx, sales, type = "o", pch = 19, col = "steelblue", lwd = 2,
     xlab = "Quarter", ylab = "Sales", main = "1. Original series")

# 2. OLS trend
fit   <- lm(sales ~ t_idx)
trend <- fitted(fit)
a <- round(coef(fit)[1], 2); b <- round(coef(fit)[2], 3)
lines(t_idx, trend, col = "red", lwd = 2, lty = 2)
legend("topleft",
       c("Data", paste0("Trend: ", a, " + ", b, "t")),
       col = c("steelblue", "red"), lwd = 2, lty = c(1, 2),
       bty = "n", cex = 0.8)
cat("Trend: T =", a, "+", b, "* t\n")
#> Trend: T = 45.12 + 1.382 * t
Code
cat("Trend per quarter:", round(b, 3), "\n\n")
#> Trend per quarter: 1.382
Code
# 3. Seasonal indices
ratio       <- sales / trend
ratio_mat   <- matrix(ratio, ncol = 4, byrow = TRUE)
colnames(ratio_mat) <- paste0("Q", 1:4)
avg_ratio   <- colMeans(ratio_mat)
global_mean <- mean(avg_ratio)
IVE         <- avg_ratio / global_mean

cat("Average ratios:", round(avg_ratio, 4), "\n")
#> Average ratios: 0.7869 1.0737 1.3238 0.8153
Code
cat("IVE          :", round(IVE, 4), "\n")
#> IVE          : 0.7869 1.0738 1.3239 0.8154
Code
cat("Sum of IVE   :", round(sum(IVE), 4), "(should be 4)\n\n")
#> Sum of IVE   : 4 (should be 4)
Code
barplot(IVE * 100, names.arg = paste0("Q", 1:4),
        col = c("skyblue", "orange", "tomato", "lightgreen"),
        ylab = "IVE (%)", main = "3. Seasonal indices", ylim = c(0, 140))
abline(h = 100, lty = 2, col = "red")

# 4. Deseasonalise
deseas <- sales / rep(IVE, 4)
plot(t_idx, sales, type = "o", pch = 19, col = "steelblue", lwd = 2,
     xlab = "Quarter", ylab = "Sales",
     main = "4. Original vs deseasonalised")
lines(t_idx, deseas, type = "o", pch = 17, col = "darkgreen", lwd = 2)
lines(t_idx, trend, col = "red", lwd = 2, lty = 2)
legend("topleft", c("Original", "Deseasonalised", "Trend"),
       col = c("steelblue", "darkgreen", "red"),
       pch = c(19, 17, NA), lwd = 2, lty = c(1, 1, 2),
       bty = "n", cex = 0.8)

# 5. Forecast 2024
t_fc     <- 17:20
trend_fc <- a + b * t_fc
forecast <- trend_fc * IVE

cat("=== 2024 forecasts ===\n")
#> === 2024 forecasts ===
Code
fc_df <- data.frame(Quarter  = paste0("Q", 1:4),
                    Trend    = round(trend_fc, 2),
                    IVE      = round(IVE, 4),
                    Forecast = round(forecast, 1))
knitr::kable(fc_df)
Quarter Trend IVE Forecast
Q1 Q1 68.61 0.7869 54.0
Q2 Q2 70.00 1.0738 75.2
Q3 Q3 71.38 1.3239 94.5
Q4 Q4 72.76 0.8154 59.3
Code
par(mfrow = c(1, 1))

Interpretation. Q3 is the peak season (IVE \(\approx\) 132.4%, about 32% above trend); Q1 is the weakest. The deseasonalised series tracks the linear trend closely, supporting the multiplicative model.

C.8 C.8 Notation bridge

Notation used across all seven topics.

Concept Descriptive (T1–T2) Probabilistic (T4–T5) Time series (T7)
Mean \(\bar{x}\) \(\mu = \mathbb{E}[X]\) \(\hat{T}_t\) (trend)
Variance \(S^2\) \(\sigma^2 = \operatorname{Var}(X)\)
Std deviation \(S\) \(\sigma\)
Covariance \(S_{XY}\) \(\sigma_{XY} = \operatorname{Cov}(X, Y)\)
Correlation \(r\) \(\rho = \operatorname{Corr}(X, Y)\)
Relative frequency \(f_i\) \(p_i\) (probability)
Cum. relative frequency \(F_i\) \(F(x)\) (CDF)

C.9 C.9 Final checklist

Before the exam, make sure you can:

  • Build a frequency table from raw data and compute all descriptive statistics.
  • Interpret the \(CV\) to compare dispersion across different scales.
  • Compute a regression line by hand and interpret slope, intercept, and \(R^2\).
  • Explain why correlation \(\neq\) causation with a concrete example.
  • Apply Bayes’ theorem using a tree diagram or a \(2 \times 2\) frequency table.
  • Identify the correct probability distribution from a problem description.
  • Compute Laspeyres, Paasche, and Fisher indices and interpret the decomposition.
  • Deflate a nominal series using the CPI.
  • Carry out a complete time-series decomposition (trend \(\to\) IVE \(\to\) deseasonalise \(\to\) forecast).
  • Explain the difference between additive and multiplicative models.