You are on page 1of 18

-A-

Alpha Level: In significance testing, the alpha level (sometimes called the
significance level) is a probability that specifies how extraordinary the observed
result must be for us to reject the null hypothesis. Typical alpha levels are 0.10,
0.05, and 0.01, but any value may be used provided it is specified before the data
are examined. In practice, if we observe a statistic whose p-value computed based
on the null hypothesis is less than the pre-determined alpha-level, then we reject the
null hypothesis. Because the alpha level defines a "rare event," and rare events do
happen sometimes, the alpha level determines the probability of a Type I error.

Alpha Risk: The chance or probability of making a Type I Error. This probability is
always greater than zero. The researcher establishes a maximum acceptable risk,
usually 5%, of deciding to reject Ho when it is actually true.

Alternative Hypothesis (Ha): Statement of change, difference, or a relationship.


This statement is considered true if Ho is rejected.

ANOVA: Anova stands for ANalysis Of Variance. It compares the means of a


quantitative variable for different levels of one or more categorical variables. The null
hypothesis tested by an ANOVA is that the means are equal for the different levels.
The technique partitions the total variance into components:
- Between treatment levels (factors or X)
- Within Treatment (Error)

Attribute Data: Quality data that typically reflects the number of conforming or
non-conforming units on go/ no go, pass/ fail basis.

Attribute Gage R&R: A methodology to assess attribute measurement systems.


The gage R&R will qualify measurement error due to repeatability, reproducibility,
and calibration.

Average: The average of a collection of values is the sum of all the values divided
by the number of values, otherwise called the mean. The term "average" is
sometimes taken to mean any summary of the center of a distribution, but this
usage is not common in statistics.

Axis: In graphing on the Cartesian plane (as in common x-y plots), the vertical and
horizontal lines that give the coordinates are called axes. By convention, the y-axis
runs up and down, and the x-axis runs side-to-side. Displays such as dotplots and
boxplots have a single axis; usually the vertical y-axis, while displays like histograms
use only the horizontal x-axis.
-B-

Balanced Design: A design in which the same number of runs are performed at
each treatment combination.

Bar Chart: A bar chart displays a batch of numbers with side-by-side bars, usually
spaced evenly along a horizontal base. Each bar corresponds to a category of data.
The height of each bar is proportional to the number of cases in its category. Bar
charts differ from histograms because they graph categorical variable values. One
easy way to spot the difference is that the bars in a bar chart should have some
space between them, while the bars in a histogram should touch each other.

Beta Risk: The chance or probability of making a Type II Error. This probability is
always greater than zero. The researcher establishes a maximum acceptable risk,
usually 10 %, of overlooking a source of variation.

Bias: A bias in sampling is when observations favor some individuals in the


population over others. Sampling bias is difficult to identify, and probability theory
cannot be used to offset these sampling errors.

Bimodal: A distribution shape is bimodal if it has two modes. Sometimes it is not


clear whether a bump on a histogram is a second mode or just a local bump. One
way to decide is to rescale the histogram. If the bump is present at many different
scales, then it can reasonably be called a second mode. If not, then it may be just an
accident of the plot scale.

Binomial: Literally "two names". A binomial setting is one in which observations


take on one of only two possible values. By convention, these are often called
"success" and "failure" although succeeding or failing need not be involved. Also by
convention, the probability of a "success" is usually denoted p. The probability of a
"failure" must then be 1 - p.

Blocking: An experimental technique used to account for the variability due to


nuisance or noise variables. An example would include using blocking to address
multiple batches of raw materials in which parts from a single batch are likely to be
more uniform than parts from different batches. Material batch would be blocking
variable.

Box Plot: A boxplot displays the 5-number summary of a variable. Boxplots graph
against a single, usually vertical, axis. Boxplots show a central box running from the
first to the third quartiles, marked by a line at the median. They then extend
whiskers either to the maximum and minimum data values, or to the most extreme
values within 1.5 IQR's of each quartile. If any data values are outside these limits,
they are displayed individually because they may be outliers.
-C-

Capability Analysis: A statistical measure of inherent variation for a given event in


a stable process.

Categorical: A categorical variable records into which of several categories or levels


an individual falls. Categorical variables need not be numeric; they can simply name
the category. Contrast with quantitative variables.

Causation: Variable A can be said to cause variable B if:


1. Whenever A changes, there is a corresponding change in B
2. B could not cause A, and
3. There is no other variable, C, that could cause both A and B.
Scientists usually require that in addition, there be a plausible mechanism connecting
A to B. Causation is best demonstrated with a designed experiment. In particular,
the existence of a correlation or association between two variables does not
demonstrate a causal relationship between them.

C Chart: Number of defects. Must have equal subgroup size.

Center: The center of a distribution is a general concept referring to the middle of


the values in the distribution. Common measures of the center of a distribution are
the mean, the median, and the mid-quartile. Some texts present the mode as a
measure of center, but it in fact measures something else entirely.

Central Limit Theorem: The central limit theorem (CLT) states that the sampling
distribution of the sample mean approaches the normal distribution as the sample
size, n, increases regardless of the distribution of the population. It is common to
include with the CLT the related facts that this sampling distribution has a mean
equal to the population mean and a standard deviation equal to the population
standard deviation divided by the square root of n.

Central Tendency: A term used in some introductory texts where center is


intended. There is no "tendency" involved in the center of a distribution. It is true,
however, that averages tend toward the population mean as the sample size grows,
as stated by the law of large numbers. This may be the original source of confusion.

Chi-square: The chi-square statistic for testing independence in a two-way table is


found by summing the squares of the standardized residuals. It follows a chi-square
distribution on (#Rows-1)*(#Cols-1) degrees of freedom. The chi-square distribution
is skewed to the right, and the test for independence is a one-sided test in which
larger values of the chi-square statistic correspond to smaller p-values.
Confidence: Confidence is a declaration of probability for some claim. Most often,
"confidence" refers to the probability that a confidence interval includes the intended
population parameter value. Confidence probabilities are usually found by referring
to the sampling distribution of the statistic being used to estimate the population
parameter.

Confidence Bounds: The confidence bounds are the lower and upper limits of a
confidence interval.

Confidence Interval: A level C confidence interval for a population parameter is an


interval of values usually of the form statistic ± margin of error found from data in
such a way that it has probability C of including the parameter value in the interval.
The data must be representative of the population either because of appropriate
sample design or by resulting from a well-designed randomized experiment.

Confidence Level: In constructing a confidence interval the confidence level is the


probability that an interval constructed in this manner (that is, based on a sample of
this size drawn from this population, for example) will cover the true population
parameter.

Confounding: One or more effects that cannot unambiguously be attributed to a


single factor or factor interaction.

Common Cause Variation: Variation that occurs naturally and is inherent and
expected in a stable process. Such variation can be attributed to “chance” or random
causes.

Continuous: Continuous values are numbers that can fall anywhere in a range of
values. Contrast with discrete values.

Continuous Random Variable: A continuous random variable is a random variable


that can take on any numeric value within a range of values. The range may be
infinite or bounded at either or at both ends. The relationship between the intervals
within the range and probabilities is given by a density curve. Contrast with discrete
random variables.

Control Chart: Control charts graph statistics measured on a process to track trends
and detect changes. Extraordinary values or patterns in a control chart often indicate
a problem with the process. Control charts usually include control lines to help
identify processes that are out of control.
Control Limits: Statistically based limits to detect a shift in the process. Control
limits exist for the critical statistic of interest (e.g. mean, variance, etc.)

Correlation: A statistical tool used to quantify the strength of the linear relationship
between a continuous input and a continuous output. Correlation is a number
between -1 and +1 that measures the degree of linear association between two
variables. In its most useful form, its square (commonly called r-squared) gives the
fraction of the variability of one variable accounted for by a least squares regression
on the other variable.

-D-

Data: Data are values or names that record information about individuals in an
orderly fashion, together with a context. Most data record the same information
about many individuals.

Defective: Units deemed to be defective contain one or more defects

Defects: failures associated with a single unit. A defective unit can contain several
defects.

Degrees of freedom: Degrees of freedom indicate the amount of information


provided by the data in a way that often parameterizes a sampling distribution. In
this course, we can usually refer to the "Rule of Thumb" that degrees of freedom are
equal to the sample size, n, minus the number of parameters that have been
estimated. However, this "rule" is violated by the two-sample t procedure, for which
a special approximation provides better results, and the chi-square statistic for two-
way tables, which finds degrees of freedom from the table dimensions rather than
the sample size.

Density Curve: A density curve is a curve that is never negative (that is, it is
always above the horizontal axis) and that has an area of exactly 1.0 under the
curve. Density curves relate ranges of data values to the relative frequency of the
values in those ranges. Density curves provide a mathematical model to describe
the shapes shown by histograms.

Dependent variable: The dependent variable is also referred to as the response


variable.

Design of Experiment (DOE): Adds structure to the experiment to ensure all


experimental objectives are achieved and the experimental assumptions are
satisfied. Design of Experiment is often referred to as DOE’s or factorial experiments.
Discrete: Discrete values are named individually or fall into named "bins". Contrast
them with continuous values.

Distribution: The distribution of a variable gives the values the variable can take on
and tells how frequently it takes on each of these values.

Dotplot: A display of a single variable in which points are arranged along a (usually)
vertical axis. Dotplots show the spread of the data, any clustering or local modes,
and any outliers. Some dotplots spread out overlapping points; others do not.
Dotplots are related to boxplots.

-E-

Experiment: A test or series of tests in which X’s are purposefully varied and impact
on Y quantified.

Experimental run / Treatment Combinations: a single combination of factor


levels that yields one observation of the output variable. If temperature is being
evaluated at 10 and 20 degrees and pressure at 50 and 100 p.s.i. an example of an
experimental run would be 20 degrees and 50 p.s.i.

-F-

F distribution: The F distribution is one of the common sampling distributions used


in statistics. Formally, a random variable with an F distribution can be found as the
ratio of two independent random variables with chi-square distributions.

F-statistic: A test statistic that follows the F distribution. Typically F-statistics arise
as the ratio of two values, each following the chi-square distribution. Common tests
based on F-statistics test whether the numerator and denominator of the ratio are
equal.

Factors: Experimental factors are the X’s or inputs being studied.

Factor Level: The Factor levels are the settings of the X’s being evaluated. For
example if the factor temperature is being studied at 10 and 20 degrees, 10 and 20
are factor levels. If a qualitative factor is being evaluated such as shift, shift one, two
and three are the factor levels.

FMEA: A structured approach to


- Predict failures and prevent their occurrence in manufacturing and other
functional areas which generate defects
- Identify the way in which a process can fail to meet critical customer
requirements (Y).
- Estimate the severity, detection and occurrence of defects.
- Estimate the current control plan for preventing these failures from
occurring
- Prioritize the actions that should be taken to improve the process.

First Time Yield (FTY): Number of defective units divided by the total number of
units sampled.

-G-

Gauge R&R Study: A study to ensure that our measurement systems are
statistically sound.

Gage accuracy or Bias: Accuracy or bias is the difference between the observed
average of the measurement and reference value (Metrology lab reference).

Gage Stability: Gage stability refers to the difference in the average of at least two
sets of measurements obtained with same gage on the same parts taken at different
times.

Gage Linearity: Gage linearity is the difference in the accuracy values throughout
the measurement range in which gage is intended to be used.

Gage Repeatability: Gage repeatability is the variation in measurements obtained


with one measurement instrument used several times by one appraiser while
measuring the identical characteristic on the same part.

Gage Reproducibility: Gage Reproducibility is the variation in the average of the


measurements made by different appraisers using the same measuring instrument
when measuring the identical characteristic on the same part. Reproducibility can
also include the evaluation of several pieces of equipment using the same operator
on the same part.

-H-

Histogram:
A histogram displays a batch of numbers with adjacent, side-by-side bars. Each bar
corresponds to a range of data values. The size of each bar is proportional to the
number of data values falling in its range. Histograms differ from bar charts because
they graph quantitative variable values.
Hypothesis: A theory about nature of relationships between input and output
variables.

-I-

Independent variable: Independent variables are also referred to as predictor


variables and explanatory variables. In ANOVA, they are common called factors.
These may be better terms because they are less likely to be confused with the
probability-based concept of independence.

Inference: To infer is to reach a conclusion by reasoning from something known or


assumed. In statistics, inference usually refers to formal procedures for combining
assumptions about unknown population characteristics with observed facts about
data to reach a conclusion.

Interaction: An interaction exists when the main effect of a factor depends on the
level of another factor. For example, if a 10 unit change exists change exists when
pressure is varied from 50 to 100 psi at a temperature of 10 degrees , but a 1 unit
change occurs at a temperature 20 degrees when the pressure is varied , an
interaction has occurred between temperature and pressure.

Interquartile Range: Difference between the 75th percentile and the 25th percentile.

-K-

Kurtosis: The kurtosis measures the "peakedness" of a histogram or density curve.


It is commonly adjusted so that the kurtosis of the normal density is zero. Flatter
densities such as the uniform density have a kurtosis less than zero. More peaked
densities, such as those of the t distribution with 1 or 2 degrees of freedom have a
kurtosis greater than one. You can think of the kurtosis as measuring how much the
spread changes as we move from the middle of the density to the tails. Densities
with large kurtosis give the impression of ever greater spread as we move to the
tails.

-L-

Linear regression: Linear regression is the most common form of regression. It


predicts or describes a response variable in terms of a linear equation with one or
more predictor variables. It therefore has an equation of the form.
-M-

Main Effect: The change in average response (Y) observed during a change in the
factor (X) from one level to another.

Mean: Most common statistic used to describe the center of the data. The mean is
also commonly referred to as the average. The mean is a parameter describing a
density curve. Intuitively, it describes the middle or center of the distribution.

Median: The median is the middle value of a collection of values. An equal number
of values in the collection are greater than and less than the median.

Mode: Observation that occurs most often in data set. Mostly used for skewed
distributions.

Mistake Proofing: Process improvements efforts often falter during implementation


of new operating methods learned in the improvement phase. Without a overall
strategy and workable control tactics to guarantee permanency, sustainable
imp rovements can not be achieved. Mistake Proofing seeks to gain permanency by
eliminating or rigidly controlling human intervention in a process. Mistake Proofing
uses wisdom and ingenuity to create devices that eliminate defect.

Multi-Vari: A graphical tool which, through logical sub-grouping, analyzes the


effects of discrete X’s on Y’s. The X’s must be discrete or made to be discrete in
order to use the multi-vary technique.

Multiple Linear Regression: Regression model containing two or more predictors


variables predict or describe a single response variable.

-N-

NP Chart: Number of defective units. Must have equal subgroup size.

Normal Curve: The normal curve is a bell-shaped, unimodal, symmetric shape


depicting the density curve of the normal distribution. Normal distributions are
characterized by their mean and standard deviation.

Normal Distribution: The normal distribution is the most common distribution


shape in statistics. It serves as a standard of comparison for other distribution
shapes. Each combination of mean and standard deviation generates a unique
normal curve.
Normality: The property of having a normal distribution. Normality is an assumption
required for many statistics tests. It is often best checked with a normal probability
plot

Null Hypothesis (Ho ): Statement of no change, no difference, or no relationship.


Refers to the true parameters (e.g.l,r,p) of a population. This statement is
assumed true until sufficient evidence is (sample data) is presented to reject it.

-O-

Observation: An individual measurement included in a sample.

One-sided: A hypothesis test in which only deviations from the null hypothesis in
one specified direction (typically thought of as to the right or to the left on the
density curve of the test statistic's sampling distribution) are considered. One-sided
(sometimes called one-tailed tests) typically have a smaller P-value for a given
statistic value than the corresponding two-sided test, but the choice of a one-sided
test must be defended on substantive grounds prior to computing the statistic.

One-tailed: A hypothesis test in which only deviations from the null hypothesis in
one specified direction (typically thought of as to the right or to the left on the
density curve of the test statistic's sampling distribution) are considered. One-sided
(sometimes called one-sided tests) typically have a smaller P-value for a given
statistic value than the corresponding two-tailed test, but the choice of a one-sided
test must be defended on substantive grounds prior to computing the statistic.

Orthogonal: A property that ensures all experimental factors are independent of


each other. No correlation exists between X’s.

Outcome: The outcome of an event is the value measured, observed, or reported


for an individual instance of that event.

Outlier: An outlier is a value that deviates from the others in a dataset. Outliers
may be extraordinarily large or small, or differ by having some exceptional
combination of values on two or more variables. Ordinarily, it is a good idea to set
outlier values aside, analyze the data without them, and then consider why the
outlier value was so extraordinary.
-P-

P-value: In assessing the significance of a null hypothesis, a P-value is the


probability of observing a value for a test statistic at least as far from the
hypothesized value as the statistic value actually observed. A small P-value indicates
either that the observation is improbable or that the probability calculation was
based on incorrect assumptions. The assumed truth of the null hypothesis is the
assumption under suspicion.

Pareto Chart: A representation of the relative importance of the process causes or


defects.

P Chart: Proportion of defective units. Can have unequal subgroup size.

Pie chart: A pie chart displays the distribution of a categorical variable by showing
the fraction of the cases falling within each of the categories. Since all the categories
account for 100% of the cases, a pie chart divides up a circle into wedges (pie slices)
that add to 100% of the circle. Many statisticians dislike pie charts because they
think it difficult to judge the size of a wedge. However, pie charts are widely used.

Population: All the statistics that exist today about a process plus what will exist in
future. Usually we cannot afford to observe or measure the entire population, so we
select a representative sample.

Power: The sensitivity of a statistical test to detect a difference when there really is
one, or the probability of being correct in rejecting Ho. Power = 1- Beta.

Practical Significance: used when the magnitude of change, difference, or


relationship is important to the researcher.

Precision: A confidence Interval is precise if it is narrow. Precision refers to pinning


down an estimated value, but not necessarily to the accuracy of that value.

Probability: The probability of an outcome is a number between 0 and 1 that


reports the relative frequency of the outcome's occurrence in a long series of trials.

Process: A process generates data over time. Typically, we think of a process as


having no beginning or end, but rather generating a potentially infinite stream of
data. A process is different from a sample because process data do not represent an
unseen population. Chief concern in process analysis is whether the process is in
control.
Process Map: A graphical representation of how the process is actually performed.
All rework operations and non value movements must be included.

-Q-

Qualitative: A word for categorical. Used most often in contrast to quantitative in


describing a variable whose values name categories.

Quantitative: A quantitative variable takes on a continuous range of values. Its


units should reflect how it has been measured. Contrast with categorical variable.

Quartile: Quartiles are roughly the 25% and 75% points of a variable. That is,
about a quarter of the data values fall below the first (25%) quartile, and about a
quarter of the data values fall above the third (75%) quartile. The second quartile,
corresponding to the 50% point, is the median.

-R-

R-chart: An R-Chart is a control chart that displays the ranges of samples from a
process along with suitable control lines.

Random: A phenomenon is random if the values of individual outcomes of that


phenomenon are uncertain, but nonetheless a regular distribution of outcome values
can be seen over a large number of repetitions.

Random Sample: In a random sample individuals are selected from a population by


chance in an attempt to reduce bias and make the sample representative of the
population.

Random Sampling: Random sampling occurs when a sample is drawn from a


population and every individual has an equal chance of being selected.

Random Variable: A random variable is a variable whose value is the outcome of a


random phenomenon. When a random phenomenon is described by a random
variable, the sample space is just the possible values of the random variable. A
discrete random variable takes on a finite number of possible values. A continuous
random variable can take on any value in a range of values.

Randomization: Randomization involves running the experimental runs in random


order. Randomization is a key component in ensuring that the experimental
assumptions are valid.
Range: Range is the difference between the largest observation and the smallest
observation in the data set

Regression: Is a prediction equation, which allows the output to be predicted using


the input [Y= F(x)]. A technique for investigating and modeling the relationship
between input and output variables.

Residual: Difference between the observed or measured value and the fitted value
predicted by the equation.

Replicate: When performing an experiment, it is important to observe the


consequences of any treatment many times. A single observation may be unusual.
Replicating and then combining the many resulting observations reduces the
variance of the observed outcomes.

Replication: Repeating the entire experiment. All treatment combinations are


repeated. Two benefits to replication:
- Provides for an experimental error.
- Provides a better estimate of the variable or interaction effect.

Response: In experiments, we apply a treatment to individual subjects and observe


or measure their response. The response variable is thus the attribute of the subjects
that we anticipate will be changed by the treatment.

Response variable: The response variable is the variable whose values are
predicted or described in terms of the explanatory variables or factors. In an
experiment, the response variable should be identified before the experiment is
performed to prevent invalid hunts for possible responses that may show a
significant difference. Surveys may yield several possible response variables. The
convention in statistics is to denote the response variable "y" and display it on the
vertical axis of a plot.

Roll throughput yield (RTY): The probability of a unit traveling through all process
steps defect free.
-S-

S-Chart: An S-Chart is a control chart that displays the standard deviations of


samples from a process along with suitable control lines.

Sample: A subset of the population taken over time or at a single sampling.

Sample size: The number of cases in the sample. Larger sample sizes permit more
reliable and more accurate estimation of population parameters, so the sample size
is an important consideration in designing studies. The sample size is denoted n by
common convention.

Sampling: Sampling is the act of drawing a sample or portion of a larger population


to represent that population so that observations or measurements on the sample
can provide information about the population. The simplest common method of
sampling is a simple random sample.

Scatter Plot: Scatter plots display the relationship between two variables. By
convention, the (vertical) y-axis displays the values of the variable that we hope to
predict or explain, and the (horizontal) x-axis displays the values of the variable on
which we will base our prediction or explanation. If neither variable is to be predicted
or explained by the other, then either can appear on either axis.

Sigma: A statistic used to quantify the variability in a process. It quantifies the


spread about the mean.

Sigma Level: A statistic used to describe the performance of a process to the


specification limits. The sigma level is the number of standard deviations from the
specification limits to the mean of the process.

Shape: The shape of a density curve or histogram is usually described in terms of


whether it is symmetric or skewed, how many modes it has, and whether it has long
or short tails.

Significance Level: In significance testing, the significance level, commonly


denoted and called the alpha level, is a probability that specifies how extraordinary
the observed result must be for us to reject the null hypothesis. Typical significance
levels are .10, .05, and .01, but any value may be used provided it is specified
before the data are examined. In practice, if we observe a statistic whose P-value
computed based on the null hypothesis is less than the pre-determined significance
level then we reject the null hypothesis. Because the significance level defines a
"rare event," and because rare events do happen sometimes, the significance level
determines the probability of a Type I error.
Significance Test: A formal inference method in which we assert a null hypothesis,
which specifies a value for a population parameter. We then assess the likelihood (if
the null hypothesis were true) of observing the statistic value we have indeed found.
A low probability leads us to doubt the null hypothesis.
(Also called a hypothesis test)

Simple Random Sample: A simple random sample (SRS) of size n is one in which
each possible sample of n individuals has an equal chance of selection. An alternative
(and equivalent) definition is that each individual in the population has an equal and
independent chance of being selected. The randomness applied deliberately in simple
random sampling is what allows the derivation of a sampling distribution for statistics
and thus what allows statistical inference.

Simple Random Sampling: A simple random sample (SRS) of size n is one in


which each possible sample of n individuals has an equal chance of selection. An
alternative (and equivalent) definition is that each individual in the population has an
equal and independent chance of being selected.

Standard Deviation: The square root of the variance. The standard deviation has
the same units as the original data.

SPC: Statistical Process Control. SPC is a statistically based graphing technique that
compares current process data to a set of stable control techniques established from
normal process variation.

Skewed: A histogram or density curve is said to be skewed if one tail stretches


substantially farther from the center than the other. The skewness is said to be "to
the right" or "to the left" according to which tail (right or left, respectively) is the
longer one. Shapes that are not skewed are said to be symmetric.

Skewness: A histogram or density curve is said to be skewed if one tail stretches


substantially farther from the center than the other. The skewness is said to be "to
the right" or "to the left" according to which tail (right or left, respectively) is the
longer one. Shapes that are not skewed are said to be symmetric.

Standardize: A variable is said to be standardized when it has been adjusted to


have a mean of zero and a standard deviation of one. Usually this is accomplished
by subtracting the sample mean and dividing by the sample standard deviation.
Standardized values are sometimes called z-scores.

Stratified sampling: A method of sampling in which the population is divided into


two or more strata and samples selected from each. Members of a stratum should
share characteristics. If we know the true population proportions of these
characteristics, we can arrange that the sample represent the population by having
the correct proportion of each stratum.
Statistical Significance: used when rejecting Ho , i.e. , where the sample data too
different from Ho to be reasonably attributed to chance( sampling error. )

-T-

t distribution: The t distributions are a family of unimodal, symmetric density


curves that use an additional parameter, the degrees of freedom. The t distributions
are especially important because they describe the sampling distribution of many
statistics, and thus are useful for inference. For example, the sampling distribution of
the sample mean when it has been standardized by the observed standard deviation
follows a t-distribution on n-1 degrees of freedom. The t distribution with infinite
degrees of freedom is the normal distribution, and in practice any t distribution with
more than 100 degrees of freedom (or so) is not discernibly different from the
normal t-distributions commonly arise as the ratio of a variable that follows a Normal
distribution and a variable that independently follows a chi square distribution.

Test Statistic: A standardized value (Z, t, F, etc.) calculated from sample data.
Used to calculate risk of decision error by comparison with a known distribution if Ho
is true.
* A Hypothesis test is an a priori theory relating to differences between variables.
A statistical test or Hypothesis test is performed to prove or disprove the theory.
* A Hypothesis test converts the practical problem into a statistical problem.
- Since relatively small sample sizes are used to estimate population
parameters there is always a chance of collecting a non-representative
sample.
- Inferential statistics allows us to estimate the probability of getting a non-
representative sample.

Time Series Chart: A graph that plots performance data over time for a process,
usually as a line chart.

Type I Error: The error in rejecting Ho when it is in fact true (e.g., saying there is
difference when there really is not.)

Type II Error: The error in failing to reject Ho when it is in fact false (e.g., saying
there is no difference when there really is)
-U-

U Chart: Number of defects per unit. They c an have unequal subgroup size.

Unimodal: A distribution shape is unimodal if it has a single mode. Distributions


with two modes are called bimodal. Because few statisticians can count above two,
distributions with three or more modes are called multimodal.

Uncontrolled or Special Cause Variation: Variation that occurs when an abnormal


action enters a process and produces unexpected and unpredictable results.

-V-

Variable: A variable is a collection of measurements or observations of the same


aspect about related individuals. The individuals are commonly called subjects or
cases. Variables may hold numbers, in which case they are called quantitative, or
they may hold names, in which case they are called categorical.

Variation: Data vary. Even attempts to reproduce an experiment or observation


exactly cannot reproduce the exact outcome. The observation-to-observation
differences are variation. Statistics studies ways to minimize, control, and
understand variation, and ways to draw intelligent conclusions from data despite
their variation.

Variable Gage R&R: A methodology to assess measurement error in continuous


measurement systems. Total observed variability includes both part and
measurement system variation. Measure ment system analysis addresses error due
to accuracy and precision. Gage R&R focuses only on precision component.

Variance: The variance is the square of the standard deviation. Although the
variance is a poor measure of spread (for example, because it is not in the same
units as the data), standard deviations are most often combined by squaring them
(to obtain variances) and then combining the variances.
-X-

Xi – MR Chart: Individuals and Moving Range Chart. Can be used for tracking both X
and Y.

X-bar chart: A n x-bar control chart plots the means of samples taken from a
process (usually at regular intervals), and compares them to a specified mean and
standard deviation.

x-variable: "x-variable" is a colloquial term for the variable plotted on the x-axis in
a scatterplot or used as the predictor variable in a regression. It may also be called
the factor.

-Y-

y-variable: "y-variable" is a colloquial term for the variable plotted on the y-axis in
a scatterplot or used as the dependent variable in a regression. It may also be called
the response.

YX Diagram: A tool used to identify/ collate potential X’s and assess their relative
impact on multiple Y’s (all Y’s which are customer focused).

-Z-

Z-scores: Z-scores are values that have been standardized to have a mean of zero
and a standard deviation of one. They often appear with reference to the standard
normal distribution.

You might also like