Professional Documents
Culture Documents
Alpha Level: In significance testing, the alpha level (sometimes called the
significance level) is a probability that specifies how extraordinary the observed
result must be for us to reject the null hypothesis. Typical alpha levels are 0.10,
0.05, and 0.01, but any value may be used provided it is specified before the data
are examined. In practice, if we observe a statistic whose p-value computed based
on the null hypothesis is less than the pre-determined alpha-level, then we reject the
null hypothesis. Because the alpha level defines a "rare event," and rare events do
happen sometimes, the alpha level determines the probability of a Type I error.
Alpha Risk: The chance or probability of making a Type I Error. This probability is
always greater than zero. The researcher establishes a maximum acceptable risk,
usually 5%, of deciding to reject Ho when it is actually true.
Attribute Data: Quality data that typically reflects the number of conforming or
non-conforming units on go/ no go, pass/ fail basis.
Average: The average of a collection of values is the sum of all the values divided
by the number of values, otherwise called the mean. The term "average" is
sometimes taken to mean any summary of the center of a distribution, but this
usage is not common in statistics.
Axis: In graphing on the Cartesian plane (as in common x-y plots), the vertical and
horizontal lines that give the coordinates are called axes. By convention, the y-axis
runs up and down, and the x-axis runs side-to-side. Displays such as dotplots and
boxplots have a single axis; usually the vertical y-axis, while displays like histograms
use only the horizontal x-axis.
-B-
Balanced Design: A design in which the same number of runs are performed at
each treatment combination.
Bar Chart: A bar chart displays a batch of numbers with side-by-side bars, usually
spaced evenly along a horizontal base. Each bar corresponds to a category of data.
The height of each bar is proportional to the number of cases in its category. Bar
charts differ from histograms because they graph categorical variable values. One
easy way to spot the difference is that the bars in a bar chart should have some
space between them, while the bars in a histogram should touch each other.
Beta Risk: The chance or probability of making a Type II Error. This probability is
always greater than zero. The researcher establishes a maximum acceptable risk,
usually 10 %, of overlooking a source of variation.
Box Plot: A boxplot displays the 5-number summary of a variable. Boxplots graph
against a single, usually vertical, axis. Boxplots show a central box running from the
first to the third quartiles, marked by a line at the median. They then extend
whiskers either to the maximum and minimum data values, or to the most extreme
values within 1.5 IQR's of each quartile. If any data values are outside these limits,
they are displayed individually because they may be outliers.
-C-
Central Limit Theorem: The central limit theorem (CLT) states that the sampling
distribution of the sample mean approaches the normal distribution as the sample
size, n, increases regardless of the distribution of the population. It is common to
include with the CLT the related facts that this sampling distribution has a mean
equal to the population mean and a standard deviation equal to the population
standard deviation divided by the square root of n.
Confidence Bounds: The confidence bounds are the lower and upper limits of a
confidence interval.
Common Cause Variation: Variation that occurs naturally and is inherent and
expected in a stable process. Such variation can be attributed to “chance” or random
causes.
Continuous: Continuous values are numbers that can fall anywhere in a range of
values. Contrast with discrete values.
Control Chart: Control charts graph statistics measured on a process to track trends
and detect changes. Extraordinary values or patterns in a control chart often indicate
a problem with the process. Control charts usually include control lines to help
identify processes that are out of control.
Control Limits: Statistically based limits to detect a shift in the process. Control
limits exist for the critical statistic of interest (e.g. mean, variance, etc.)
Correlation: A statistical tool used to quantify the strength of the linear relationship
between a continuous input and a continuous output. Correlation is a number
between -1 and +1 that measures the degree of linear association between two
variables. In its most useful form, its square (commonly called r-squared) gives the
fraction of the variability of one variable accounted for by a least squares regression
on the other variable.
-D-
Data: Data are values or names that record information about individuals in an
orderly fashion, together with a context. Most data record the same information
about many individuals.
Defects: failures associated with a single unit. A defective unit can contain several
defects.
Density Curve: A density curve is a curve that is never negative (that is, it is
always above the horizontal axis) and that has an area of exactly 1.0 under the
curve. Density curves relate ranges of data values to the relative frequency of the
values in those ranges. Density curves provide a mathematical model to describe
the shapes shown by histograms.
Distribution: The distribution of a variable gives the values the variable can take on
and tells how frequently it takes on each of these values.
Dotplot: A display of a single variable in which points are arranged along a (usually)
vertical axis. Dotplots show the spread of the data, any clustering or local modes,
and any outliers. Some dotplots spread out overlapping points; others do not.
Dotplots are related to boxplots.
-E-
Experiment: A test or series of tests in which X’s are purposefully varied and impact
on Y quantified.
-F-
F-statistic: A test statistic that follows the F distribution. Typically F-statistics arise
as the ratio of two values, each following the chi-square distribution. Common tests
based on F-statistics test whether the numerator and denominator of the ratio are
equal.
Factor Level: The Factor levels are the settings of the X’s being evaluated. For
example if the factor temperature is being studied at 10 and 20 degrees, 10 and 20
are factor levels. If a qualitative factor is being evaluated such as shift, shift one, two
and three are the factor levels.
First Time Yield (FTY): Number of defective units divided by the total number of
units sampled.
-G-
Gauge R&R Study: A study to ensure that our measurement systems are
statistically sound.
Gage accuracy or Bias: Accuracy or bias is the difference between the observed
average of the measurement and reference value (Metrology lab reference).
Gage Stability: Gage stability refers to the difference in the average of at least two
sets of measurements obtained with same gage on the same parts taken at different
times.
Gage Linearity: Gage linearity is the difference in the accuracy values throughout
the measurement range in which gage is intended to be used.
-H-
Histogram:
A histogram displays a batch of numbers with adjacent, side-by-side bars. Each bar
corresponds to a range of data values. The size of each bar is proportional to the
number of data values falling in its range. Histograms differ from bar charts because
they graph quantitative variable values.
Hypothesis: A theory about nature of relationships between input and output
variables.
-I-
Interaction: An interaction exists when the main effect of a factor depends on the
level of another factor. For example, if a 10 unit change exists change exists when
pressure is varied from 50 to 100 psi at a temperature of 10 degrees , but a 1 unit
change occurs at a temperature 20 degrees when the pressure is varied , an
interaction has occurred between temperature and pressure.
Interquartile Range: Difference between the 75th percentile and the 25th percentile.
-K-
-L-
Main Effect: The change in average response (Y) observed during a change in the
factor (X) from one level to another.
Mean: Most common statistic used to describe the center of the data. The mean is
also commonly referred to as the average. The mean is a parameter describing a
density curve. Intuitively, it describes the middle or center of the distribution.
Median: The median is the middle value of a collection of values. An equal number
of values in the collection are greater than and less than the median.
Mode: Observation that occurs most often in data set. Mostly used for skewed
distributions.
-N-
-O-
One-sided: A hypothesis test in which only deviations from the null hypothesis in
one specified direction (typically thought of as to the right or to the left on the
density curve of the test statistic's sampling distribution) are considered. One-sided
(sometimes called one-tailed tests) typically have a smaller P-value for a given
statistic value than the corresponding two-sided test, but the choice of a one-sided
test must be defended on substantive grounds prior to computing the statistic.
One-tailed: A hypothesis test in which only deviations from the null hypothesis in
one specified direction (typically thought of as to the right or to the left on the
density curve of the test statistic's sampling distribution) are considered. One-sided
(sometimes called one-sided tests) typically have a smaller P-value for a given
statistic value than the corresponding two-tailed test, but the choice of a one-sided
test must be defended on substantive grounds prior to computing the statistic.
Outlier: An outlier is a value that deviates from the others in a dataset. Outliers
may be extraordinarily large or small, or differ by having some exceptional
combination of values on two or more variables. Ordinarily, it is a good idea to set
outlier values aside, analyze the data without them, and then consider why the
outlier value was so extraordinary.
-P-
Pie chart: A pie chart displays the distribution of a categorical variable by showing
the fraction of the cases falling within each of the categories. Since all the categories
account for 100% of the cases, a pie chart divides up a circle into wedges (pie slices)
that add to 100% of the circle. Many statisticians dislike pie charts because they
think it difficult to judge the size of a wedge. However, pie charts are widely used.
Population: All the statistics that exist today about a process plus what will exist in
future. Usually we cannot afford to observe or measure the entire population, so we
select a representative sample.
Power: The sensitivity of a statistical test to detect a difference when there really is
one, or the probability of being correct in rejecting Ho. Power = 1- Beta.
-Q-
Quartile: Quartiles are roughly the 25% and 75% points of a variable. That is,
about a quarter of the data values fall below the first (25%) quartile, and about a
quarter of the data values fall above the third (75%) quartile. The second quartile,
corresponding to the 50% point, is the median.
-R-
R-chart: An R-Chart is a control chart that displays the ranges of samples from a
process along with suitable control lines.
Residual: Difference between the observed or measured value and the fitted value
predicted by the equation.
Response variable: The response variable is the variable whose values are
predicted or described in terms of the explanatory variables or factors. In an
experiment, the response variable should be identified before the experiment is
performed to prevent invalid hunts for possible responses that may show a
significant difference. Surveys may yield several possible response variables. The
convention in statistics is to denote the response variable "y" and display it on the
vertical axis of a plot.
Roll throughput yield (RTY): The probability of a unit traveling through all process
steps defect free.
-S-
Sample size: The number of cases in the sample. Larger sample sizes permit more
reliable and more accurate estimation of population parameters, so the sample size
is an important consideration in designing studies. The sample size is denoted n by
common convention.
Scatter Plot: Scatter plots display the relationship between two variables. By
convention, the (vertical) y-axis displays the values of the variable that we hope to
predict or explain, and the (horizontal) x-axis displays the values of the variable on
which we will base our prediction or explanation. If neither variable is to be predicted
or explained by the other, then either can appear on either axis.
Simple Random Sample: A simple random sample (SRS) of size n is one in which
each possible sample of n individuals has an equal chance of selection. An alternative
(and equivalent) definition is that each individual in the population has an equal and
independent chance of being selected. The randomness applied deliberately in simple
random sampling is what allows the derivation of a sampling distribution for statistics
and thus what allows statistical inference.
Standard Deviation: The square root of the variance. The standard deviation has
the same units as the original data.
SPC: Statistical Process Control. SPC is a statistically based graphing technique that
compares current process data to a set of stable control techniques established from
normal process variation.
-T-
Test Statistic: A standardized value (Z, t, F, etc.) calculated from sample data.
Used to calculate risk of decision error by comparison with a known distribution if Ho
is true.
* A Hypothesis test is an a priori theory relating to differences between variables.
A statistical test or Hypothesis test is performed to prove or disprove the theory.
* A Hypothesis test converts the practical problem into a statistical problem.
- Since relatively small sample sizes are used to estimate population
parameters there is always a chance of collecting a non-representative
sample.
- Inferential statistics allows us to estimate the probability of getting a non-
representative sample.
Time Series Chart: A graph that plots performance data over time for a process,
usually as a line chart.
Type I Error: The error in rejecting Ho when it is in fact true (e.g., saying there is
difference when there really is not.)
Type II Error: The error in failing to reject Ho when it is in fact false (e.g., saying
there is no difference when there really is)
-U-
U Chart: Number of defects per unit. They c an have unequal subgroup size.
-V-
Variance: The variance is the square of the standard deviation. Although the
variance is a poor measure of spread (for example, because it is not in the same
units as the data), standard deviations are most often combined by squaring them
(to obtain variances) and then combining the variances.
-X-
Xi – MR Chart: Individuals and Moving Range Chart. Can be used for tracking both X
and Y.
X-bar chart: A n x-bar control chart plots the means of samples taken from a
process (usually at regular intervals), and compares them to a specified mean and
standard deviation.
x-variable: "x-variable" is a colloquial term for the variable plotted on the x-axis in
a scatterplot or used as the predictor variable in a regression. It may also be called
the factor.
-Y-
y-variable: "y-variable" is a colloquial term for the variable plotted on the y-axis in
a scatterplot or used as the dependent variable in a regression. It may also be called
the response.
YX Diagram: A tool used to identify/ collate potential X’s and assess their relative
impact on multiple Y’s (all Y’s which are customer focused).
-Z-
Z-scores: Z-scores are values that have been standardized to have a mean of zero
and a standard deviation of one. They often appear with reference to the standard
normal distribution.