Teaching guide — R&R (MSA) Tutorial

1. Learning objectives

By the end of the session, the learner must be able to:

Explain in one minute what an R&R study is and why it precedes any capability study (Cp/Cpk) or any SPC deployment.
Decompose the total observed variation into three sources: repeatability (EV), reproducibility (AV), and part variation (PV).
Compute a Gauge R&R by ANOVA on a 3 × 10 × 3 dataset (operators × parts × repetitions).
Interpret a %Study Variation, an NDC, and issue a verdict (accepted / acceptable / rejected).
Distinguish an instrument problem (EV dominant) from a human/method problem (AV dominant).
Design a valid R&R protocol (number of parts, operators, repetitions, randomisation, blinding).

2. Prerequisites

Descriptive statistics basics: mean, standard deviation, variance.
1-factor ANOVA notion desirable (not essential — the tutorial reminds the reader).
A product tolerance available, ideally from a client drawing.

If the audience has never seen ANOVA, plan 15 extra minutes for the Hypothesis Tests tutorial as an introduction.

3. Material

1 workstation per pair with access to the tutorial (/en/tools/rr-msa) — works offline once the page is loaded (Pyodide embedded).
Ideally: 1 calliper 0.01 mm + 1 micrometre 0.001 mm + 10 parts machined on site (critical dimension chosen by the trainer).
Printed collection sheet (AIAG template).

4. Session plan (1 h 30)

Phase 1 — Hook (10 min)

Open question to the group:

“You run SPC on a critical dimension. Your X̄-R chart triggers 3 alerts a week. Your operators spend 20 minutes adjusting the machine each time. What if the instrument was the one lying?”

Goal: install the idea that measurement is never neutral. Any data is contaminated by the measurement chain.

Phase 2 — Theory (20 min)

Walk through the tutorial section by section:

Open the σ²_total decomposition video (loop at the top of the page).
Read the “Understand R&R in 2 minutes” box together (already open by default).
Stress the EV / AV / PV definitions with concrete examples:
- EV = “I measure the same part 3 times with the same calliper → the values still vary.”
- AV = “My colleague measures the same parts → his means are offset from mine.”
- PV = “The parts are genuinely different from each other (this is what we want to see).”
Acceptance rules: 10% / 30%, NDC ≥ 5.

Phase 3 — Guided manipulation (20 min)

Load the default dataset (already in place). Click “Analyse”.

Comment the output together:

Global verdict (teal/coral badge).
ANOVA table: where are the p-values? Interpretation of the significant part effect (the instrument does see the parts, which is what we want).
Variation components chart: where is the mass? Here EV and AV are low, PV dominates.
X̄ chart: the 3 curves should follow the same profile (operators aligned). A systematic vertical offset between operators = method bias.
R chart: all points below UCL = repeatability under control per operator.

Phase 4 — Exercises (30 min, pairs)

Distribute exercises EX2, EX3, EX5 (intermediate to advanced) via the “Load a preset” menu.

For each exercise, ask pairs to:

Before clicking “Analyse”, formulate a hypothesis about the dominant source (EV or AV) from the context.
Run the calculation.
Confirm or refute the hypothesis.
Open the solution (“See the solution”) and compare the interpretation.

Collective correction of the 3 cases at the end of the phase.

Phase 5 — Debrief and action (10 min)

Round table: what will they check tomorrow morning on their line?

Suggested directions:

Identify 1 critical dimension currently in SPC but not audited in R&R.
Launch an in-house R&R study at zero cost (parts already measured for capability).
Check the last COFRAC calibration date of the critical instruments.

5. Student FAQ

”Why 10 parts, 3 operators, 3 repetitions?”

AIAG convention: 10 parts cover the process variation range; 3 operators capture human variability; 3 repetitions give a reliable repeatability estimate without exploding the time. It is a proven trade-off. Less = less reliable. More = marginal gain.

”What if I have fewer than 3 operators (automated line, no human contact)?”

Run a Type 1 R&R study (pure repeatability + bias against a standard) instead of a crossed R&R. The current tutorial covers the crossed case.

”My part is destructive (crash test, fatigue). How do I run R&R?”

Special case: Nested R&R. The same part cannot be remeasured. We measure several parts “from the same batch” assumed identical. The tutorial does not cover this case — use a specialised tool (Minitab Nested) or do the manual decomposition.

”Why ANOVA and not the older AIAG Xbar-R method?”

ANOVA is the preferred method since MSA 3rd ed. (2002): it handles the Op×Part interaction and weights correctly. The Xbar-R method remains legible by hand but sometimes underestimates AV when the interaction is significant. The tutorial applies ANOVA and pools the interaction with the error when p > 0.25 (AIAG rule).

”My NDC is 3 but the GR&R is 9%. Accepted or rejected?”

Both criteria are complementary. A low NDC may indicate that your 10 parts are too homogeneous (all close to target), not that the instrument is bad. First check that the sample properly covers the tolerance range before rejecting the instrument.

”Why does the operator variance sometimes come out negative?”

When MS_operator < MS_op×part, the formula gives a theoretically negative variance. By convention, we bound it to zero (no reproducibility detectable above the noise). The tutorial applies max(0, ...) consistently with MSA practice.

”Pyodide — is it safe? Does my data leave for a server?”

No. Pyodide is a Python runtime compiled to WebAssembly that runs entirely in your browser. No data is sent to a BCUB3 or third-party server. You can even close your internet connection after the initial load — computation continues offline.

6. Classic pitfalls

Pitfall 1 — Parts picked at random on the line

The 10 parts must cover the expected variation range (min, max, target). If you take 10 consecutive parts from a well-centred batch, your PV will be artificially low and your GR&R will look bad. Rule: choose the parts upstream, ideally from a historical SPC sample.

Pitfall 2 — Operators warned in advance

If an operator knows their gesture is being assessed, they refine their measurement. Result: the measured AV is too optimistic vs production reality. Correct protocol: do not tell operators it is an R&R. Present it as a “routine quality check”.

Pitfall 3 — Non-random repetitions

If each operator runs their 30 measurements in a row on part 1, then part 2, etc., they “learn” the measurement. Randomise the order (random part between each measurement) or leave enough time between repetitions.

Pitfall 4 — Forgotten temperature

A micrometer expands thermally. An R&R study started in the morning (workshop at 15°C) and finished at 2 p.m. (workshop at 22°C) embeds thermal drift in EV. Solution: condition the workshop or measure temperature continuously to correct.

Pitfall 5 — Product tolerance / study variation confusion

Two different metrics:

%Study Variation = GR&R / σ_TV (variation observed on the 10 parts).
%Tolerance = GR&R × k / (USL − LSL).

AIAG prefers %Study Variation for an unstable process, %Tolerance for a stable process where the tolerance is fixed. The tutorial computes both if you provide the tolerance.

Pitfall 6 — Attribute R&R confused with Continuous R&R

If your “measurement” is a visual rating (1-10), a classification (OK/NOK), or a categorical size (S/M/L), it is not a continuous R&R. It is an Attribute R&R (Cohen Kappa for 2 judges, Fleiss for 3+). The tutorial above is not compliant for this case (see exercise EX8 which puts this pitfall on stage).

7. Evaluation (optional)

End-of-session questions (quick MCQ, 5 min):

A 25% GR&R means: (a) the instrument is rejected, (b) accepted depending on criticality, (c) excellent, (d) insufficient without complement. → Answer: b
NDC = 3 on a stable process means: (a) the instrument distinguishes 3 levels, (b) the instrument classifies parts into too few categories, (c) both. → Answer: c
High EV and low AV → (a) replace the instrument, (b) retrain operators, (c) revise the procedure. → Answer: a (instrument or resolution)
The ANOVA method is preferred over Xbar-R because: (a) it is faster, (b) it handles the Op×Part interaction, (c) it requires fewer parts. → Answer: b

8. Normative and bibliographic references

AIAG MSA — 4th edition (2010). Automotive Industry Action Group. Global reference for R&R, linearity, stability, bias studies. Chapter III section B for the ANOVA GR&R method.
ISO 22514-7:2021 — Statistical methods in process management — Capability and performance — Part 7: Capability of measurement processes. Equivalent ISO standard, more neutral than AIAG (adopted outside automotive).
ISO 10012:2003 — Measurement management systems. Organisational requirements for industrial metrology.
VDA Band 5 — German automotive equivalent of MSA. Slightly different approach, stricter on thresholds (%GRR < 15% instead of 30% AIAG for critical processes).
NIST/SEMATECH e-Handbook of Statistical Methods — section 2.4 (Gauge studies). Free online, excellent pedagogical reference.
Montgomery, D. C. — Introduction to Statistical Quality Control, 8th ed. Wiley, 2020. Chapter 8 covers R&R with rigorous statistical presentation.
Burdick, Borror, Montgomery — Design and Analysis of Gauge R&R Studies. ASA-SIAM, 2005. Reference work for advanced cases (nested, expanded GRR, unbalanced variance).

9. Going further

Once R&R is mastered, move on to:

Capability + SPC Chart — uses the measured variation. Cpk only makes sense if the R&R study is first validated.
Hypothesis Tests — the 1-factor ANOVA mechanics that run under the hood of R&R.
DOE Builder — the next step: identify the factors that move the dimension, once measurement is reliable.

10. Contact

For an R&R audit on your site (IATF 16949-compliant protocol, client deliverable report), contact BCUB3. We work on critical safety characteristics, complex measuring instruments (3D scanners, tomographs), and Attribute R&R cases (visual, auditory, tactile).