You are on page 1of 19

Exploratory Factor Analysis (Part 1)

1. Glossary
Anti-image matrix correlation Matrix of partial correlations among variables after factor analysis, representing the degree to which the factors explain each other in the results. The diagonal contains the measures of sampling inadequacy for each variable, and off-diagonal values are partial correlations among variables. Bartlett test of sphericity Statistical test for overall significance of all correlations within a correlation matrix. Cluster Analysis Multivariate technique with the objective of grouping respondents or cases with similar profiles on a defined set of characteristics. Similar to Q factor analysis. Common Factor Analysis Factor model in which the factors are based on a reduced correlation matrix. That is, communalities are inserted in the diagonal of the correlation matrix, and the extracted factors are based only on the common variance, with specific and error variance excluded. Common Variance Variance shared with other variables in factor analysis. Communality Total amount of variance an original variable shares with all other variables included in the analysis. Component Analysis Factor models in which the factors are based on the total variance. With component analysis, unities are used in the diagonal of the correlation matrix; this procedure computationally implies that all the variance is common or shared. Composite Measure Conceptual Definition Specification of the theoretical basis for a concept that is represented by a factor. Content Validity Assessment of the degree of correspondence between the items selected to constitute a summated scale and its conceptual definition. Correlation Matrix Table showing the intercorrelations among all variables. Cronbachs alpha Measure of reliability that ranges from 0 to 1, with values of 0.60 to 0.70 deemed the lower limit of acceptability. Cross-loading A variable has two more factor loadings exceeding the threshold value deemed necessary for inclusion in the factor interpretation process. Dummy variable Binary metric variable used to represent a single category of a nonmetric variable. Eigenvalue Column sum of squared loadings for a factor; also referred to as the latent root. It represents the amount of variance accounted for by a factor. EQUIMAX One of the orthogonal factor rotation methods that is compromise between the VARIMAX and QUARTIMAX approaches, but is not widely used. Error Variance Variance of a variable due to errors in data collection or measurement. Face validity See content validity. Factor Linear combination (variate) of the original variables. Factors also represent the underlying dimensions (constructs) that summarize or account for the original set of observed variables. Factor indeterminacy Characteristic of common factor analysis such that several different factor scores can be calculated for a respondent, each fitting the estimated factor Page | 1

model. It means the factor scores are not unique for each individual. Correlation between the original variables and the factors, and the key to understanding the nature of a particular factor. Squared factor loadings indicate what percentage of the variance in an original variable is explained by a factor. Factor matrix Table displaying the factor loadings of all variables on each factor. Factor pattern matrix One of the two factor matrices found in an oblique rotation that is most Page | 2 comparable to the factor matric in a orthogonal rotation. Factor rotation Process of manipulation or adjusting the factor axes to achieve a simpler and pragmatically more meaningful factor solution. Factor score Composite measure created for each observation on each factor extracted in the factor analysis. The factor weights are used in conjunction with the original variable values to calculate each observations score. The factor score then can be used to represent the factor(s) in subsequent analyses. Factor scores are standardized to have a mean of 0 and a standard deviation of 1. Factor structure matrix A factor matrix found in an oblique rotation that represents the simple correlations between variables and factors, incorporating the unique variance and the correlations between factors. Most researchers prefer to use the factor pattern matrix when interpreting an oblique solution. Indicator Single variable used in conjunction with one or more other variables to form a composite measure. Latent root See Eigenvalues Measure of sampling Measure calculated both for the entire correlation matrix and each adequacy (MSA) individual variable evaluating the appropriateness of applying factor analyses. Values above 0.50 for either the entire matrix or an individual variable indicate appropriateness. Measurement error Inaccuracies in measuring the true variable values due to fallibility of the measurement instrument (i.e. inappropriateness response scales), data entry errors, or respondent errors. Multicollinearity Extent to which a variable can be explained by the other variables in the analysis. Oblique factor rotation Factor rotation computed so that the extracted factors are correlated. Rather than arbitrarily constraining the factor rotation to an orthogonal solution, the oblique rotation identifies the extent to which each of the factors is correlated. Orthogonal Mathematical independence (no correlation) of factor axes to each other (i.e. at right angles, or 90 degrees). Orthogonal factor rotation Factor rotation in which the factors are extracted so that their axes are maintained at 90 degrees. Each factor is independent of, or orthogonal to, all other factors. The correlation between the factors is determined to be 0. Q factor analysis Forms groups of respondents or cases based on their similarity on a set of characteristics. QUARTIMAX A type of orthogonal factor rotation method focusing on simplifying the columns of a factor matrix. Generally considered less effective than the VARIMAX rotation. R factor analysis Analyzes relationships among variables to identify groups of variables forming latent dimensions (factors). Reliability Extent to which a variable or set of variables is consistent in what it is intended to measure. If multiple measurements are taken, reliable measures will all be consistent in their values. It differs from validity in that it does not relate to what should be measured, but instead to how it is measured. Factor loadings

Reverse scoring

Specific variance Summated scales

Surrogate variable Trace Unique variance Validity

Variate VARIMAX

Process of reversing the scores of a variable, while retaining the distributional characteristics, to change the relationships (correlations) between two variables. Used in summated scale construction to avoid a canceling out between variables with positive and negative factor loading on the same factor. Variance of each variable unique to that variable and not explained or Page | 3 associated with other variables in the factor analysis. Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement. In most instances, the separate variables are summed and then their total or average score is used in the analysis. Selection of a single variable with the highest factor loading to represent a factor in the data reduction stage instead of using a summated scale or factor score. Represents the total amount of variance on which the factor solution is based. The trace is equal to the number of variables, based on the assumption that the variance in each variable is qual to 1. See specific variance Extent to which a measure or set of measures correctly represents the concept of study the degree to which it is free from any systematic or nonrandom error. Validity is concerned with how well the concept is defined by the measure(s), whereas reliability relates to the consistency of the measure(s). Linear combination of variables formed by deriving empirical weights applied to a set of variables specified by the researcher. The most popular orthogonal factor rotation methods focusing of simplifying the columns in a factor matrix. Generally considered superior to other orthogonal factor rotation methods in achieving a simplified factor structure.

2. What is Factor Analysis?


1. Factor analysis is an interdependence technique whose primary purpose is to define the underlying structure among the variables in the analysis. 2. Variables are the building blocks of relationships. 3. As multivariate techniques are employed, the number of variables increases to tens, hundreds or even thousands. 4. When the number of variables is low, all variables can be unique, distinct, and different. 5. But, as the number of variables rises, we see that the overlap (correlation) between variables increases significantly. 6. In some instances, the researcher may even want ways to group the variables together into new composite measures that represent each of these groups. 7. Factor analysis provides the tools for analyzing the structure of the interrelationships (correlations) among a large number of variables by defining sets of variables that are highly correlated, known as factors. 8. These groups are (by definition) highly correlated and are assumed to represent dimensions within the data. 9. Two options:

a. If we are trying to reduce the number of variables, then the dimensions can guide in creating new composite measures. b. However, if we have a conceptual basis for understanding the relationships between variables, then the dimensions may actually have meaning for what they collectively represent. These dimensions may correspond to concepts that cannot be adequately described by a single measure (a persons health is defined by various indicators that are Page | 4 measured separately but are all interrelated). 10. Factor analysis presents several ways of representing these groups of variables for use in other multivariate techniques. 11. Two approaches: a. Exploratory approach: useful in searching for structure among a set of variables or as a data reduction method. In this approach, factor analysis takes into consideration no a priori ( pehlay say mojuud) constraints on the estimation of components or the number of components to be extracted. b. Confirmatory approach:(v1 v10 WILL GROUP N MAKE ONE VARIABLE) (U HAVE INFORMATION DAT IT SHOULD HAVE 7 GROUPS NOW U WILL CONFIRM DAT THERE IS 7 VARIABLES ACTUALLY OR NOT ) here the researcher has preconceived notions and thoughts on the actual structure of the data, based on theoretical support or prior research. For example, the researcher may wish to test hypotheses involving issues such as which variables should be grouped together on a factor or the precise number of factors. In this approach, the researcher requires that factor analysis take a confirmatory approach that is, assess the degree to which the data meet the expected structure.

3. A Hypothetical Example of Factor Analysis


1. A retail firm intends to identify how consumers patronage of its various stores and their services is determined.( 2. The retailer wants to understand how consumers make decisions but feels that it cannot evaluate 80 separate characteristics or develop action plans for this many variables, because they are too specific.DEVELOPING STRATIGIES ON 80 OF THOSE VARIABLES IS DIFFICULT PLANNING EXECUTING cONTROL EVALUATING DAN ONE INTERVAINTION IS BENEFITAIL IF REAL IS THE RELATIONSHIP

3. Instead, it would like to know if consumers think in more general evaluative dimensions rather than in just the specific items.(CONSUMER THINKS MORE GENERALLY)

4. For example, consumers may consider salespersons to be a more general evaluative dimension that is composed of many more specific characteristics, such as knowledge, courtesy, likeability, sensitivity, friendliness, helpfulness, and so on.( OVERALL THINK IS MATTERING THE MOST)( WE CAN NOIT HAVE INTERVENTION DAT IS 100 PERCENT SOLID)( IF INTERBVENTION IS DONE ONE 6 VARIABLE SEPERATELY DAN U REQUIORE Page | 5 SEPARATE RESOURCES DAN CHANCES OIF ERROR WILL ALSO INCREASE) 5. To identify these broader dimensions, the retailer could commission a survey asking for consumer evaluations on each of the 80 specific items.(SURVEY AND ASK ABOUT SPECIFIC ITEMS IE 80) 6. Factor analysis would then be used to identify the broader underlying evaluative dimensions(IT IS MORE GENERALIZE , IT WAS ALREADY THEIR AND NOT VISIBLE TO EVERY ONE).(a GROUP OF VARIABLE)( 7. Specific items that correlate highly are assumed to be a member of that broader dimension.(COOLETED DATA ON 80 VARIABLES AND EACH IS CALLED ITEM AND ASSUME TO BE THE MEMBER OF THAT BRODER DIMENSION 8. These dimensions become composites of specific variables, which in turn allow the dimensions to be interpreted and described.( 9. In our example, the factor analysis might identify such dimensions as product assortment, product quality, prices, store personnel, service, and store atmosphere as the broader evaluative dimensions used by the respondents.( 10. Each of these dimensions contains specific items that are a facet of the broader evaluative dimension.(each variable reresent the part of one part of variable) 11. From these findings, the retailer may then use the dimensions (factors) to define broad areas for planning and action.(NOW WE GET BROAD DIRECTION AND NOW WE CAN PLAN FOR 7 TO 8 DIMENSION) 12. An illustrative example of a simple application of factor analysis if shown in figure 1, which represents the correlation matrix for nine store image elements.

Figure 1: Original Correlation Matrix

13. Included in this set are measures of the product offering, store personnel, price levels, and instore service and experiences.(ARE ALL OF THESES ELEMNT IS SEPARATE OR DO THEY MORE GENERAL

14. Question: are all of these elements separate in their evaluative properties or do they group into some more general areas of evaluation? Y R WE WORRIED ABOUT THE EAVLUATIVE PROPERTY 15. For example, do all of the product elements group together? 16. Where does price level fit, or is it separate? 17. How do the in-store features like store personnel, service and atmosphere relate to each other? Page | 6 18. Visual inspection of the original correlation matrix (figure 1) does not necessarily reveal any specific pattern. 19. Among scattered high correlations, variable groupings are not apparent. 20. The application of factor analysis results in the groupings of the variables as reflected in figure 2. SHADED VALUES ARE THE GROUP ORDER OF VARIBLE IS ALSO CHANGED V3 V8 V9 V2 FORMS A ONE GROUP ALL THE VARIABLES ADJACENT TO WALL GIVE GROUP OF ONE VARIBLE

Figure 2: Correlation Matrix of Variables after Grouping According to Factor Analysis

21. Here some interesting patterns emerge: a. First, four variables all relating to the in-store experience of shoppers are grouped together b. Second, three variables describing the product assortment and availability are grouped together. c. Lastly, product quality and price levels are grouped. 22. Each group represents a set of highly interrelated variables that may reflect a more general evaluative dimension.(NOW WE CAN DEVELOP ONE STRATEGY DAT CAN USE ONE O) 23. In this case, we might label the three variable groupings by the labels: in-store experience, product offering, and value. 24. This simple example of factor analysis demonstrates its basic objective of grouping highly intercorrelated variables into distinct sets (factors).(ADVANTAGE OF GROUPING DIFF VARIBLES TOGETHER?? Ans SINCE ALL VARIABLE ARE HIGHLY COORELATED DAN EK KO CHAYRAYNGAY TO SUB PAR EFFECT PARAY GA

25. In many situations, these factors can provide a wealth of information about the interrelationships of the variables. 26. In this example, factor analysis identified for store management a smaller set of concepts to consider in any strategic or tactical marketing plans, while still providing insight into what constitutes each general area (i.e., the individual variables defining each factor).Now the Page | 7 management will be able to make strategic decission dat will be effective)

4. Factor Analysis Decision Process


1. There are seven stages in factor analysis: a. Objectives of factor analysis b. Designing a factor analysis c. Assumptions in factor analysis d. Deriving factors and assessing overall fit e. Interpreting the factors f. Validation of factor analysis g. Additional uses of factor analysis results 2. Each stage merits a detail discussion.

4.1. Stage 1: Objectives of Factor Analysis


1. Starting point in factor analysis is the research problem.(DEINE THE RESEARCH PROBLEM) 2. The general purpose of factor analysis is to find a way to condense (or summarize) the information contained in a number of original variables into a smaller set of new, composite dimensions or variates (factors) with a minimum loss of information. 3. There are four key issues here: a. Specifying the unit of analysis(KIS CHEEZ KO ANALYSI KRNE HAI) b. Achieving data summarization and/or data reduction c. Variable selection d. Using factor analysis results with other multivariate techniques. 4.1.1. Specifying the unit of analysis 1. Up to this time, we defined factor analysis solely in terms of identifying structure among a set of variables. 2. Factor analysis is actually a more general model in that it can identify the structure of relationships among either variables or respondents by examining either the correlation between the variables or the correlations between the respondents.(INSTEAD OF VARIABLE WE CAN GROUP RESPONDENT(EVALUATION CRITERIA KE BUNYAD PAR)) 3. If the objective of the research were to summarize the characteristics, factor analysis would be applied to a correlation matrix of the variables (a.k.a. R factor analysis). This technique analyzes a set of variables to identify the dimensions that are latent (not easily observed).IF WE R TRYING TO GROUP RESPONDENT IT IS q-factor analysis)(when the variables r the one to b grouped dan it is R-Factor analyis) 4. Factor analysis also may be applied to a correlation matrix of the individual respondents based on their characteristics (a.k.a. Q factor analysis). This method combines large number of people

into distinctly different groups within large population. This method is not used because of computational difficulties. 4.1.2. Achieving Data Summarization versus Data Reduction 1. Factor analysis provides the researcher with two distinct, but interrelated outcomes: a. Data summarization: FA derives underlying dimensions that describe the data in a Page | 8 much smaller number of concepts than the original individual variables. Data reduction: The concept of data summarization is extended by deriving an empirical value (factor score) for each dimension (factor) and then substituting this value for the original values. Har do value ko average kr k value nikalle 250 ko bhe average kr k nikal le dat is 2nd level of summarization 2. Data Summarization: a. The fundamental concept involved in data summarization is the definition of structure. b. The structure will be used to view the set of variables at various levels of generalization, ranging from the most detailed level (individual variables themselves) to the more generalized level, where individual variables are grouped and then viewed not for what they represent individually, but for what they represent collectively in expressing a concept. c. For example, variables at the individual level might be: I shop for specials, I usually look for the lowest prices, I shop for bargains, National brands are worth more than store brands. d. Collectively, these variables can be used to classify people as either price conscious or bargain hunters. e. Remember, that FA is an interdependence technique where all variables are simultaneously considered with no distinction as to dependent or independent variables. f. FA still employs the concept of the variate(in regression Y was the variant)(linear composite of Variable), the linear composite of variables, but in FA, the variates (factors) are formed to maximize their explanation of the entire variable set, not to predict a dependent variable(s).(The purpose is to take impact of collective variable on one, HERE VARIATE IS ALSO DEPENDENT I.E Y IN THE REGRESSION) g. The goal of data summarization is achieved by defining a small number of factors that adequately represents the original set of variables.(AGREEGATE MANY INDIVIDUAL VALUES TO ONE BCZ V R SUMMAIZIND IN ORDER TO INCREASE THE CHNCES OF SUCCESS. 3. Data Reduction: a. FA can also be used to achieve data reductions by i. Identifying representative variables from a much larger set of variables for use in subsequent multivariate analyses ii. Creating an entirely new set of variables, much smaller in number, to partially or completely replace the original set of variables.(GENERTAE VALUE OF WHICH U HAVE COMBINED 10 VARIABLES TO 1)

b. In both instances the purposes is to reduce the number of variables without losing the nature and character of individual variables.(WE SHALL NOT LOSE TOO LARGE BIT OF INFORMATION) c. The identification of the underlying dimensions or factors is an end result of data summarization. d. Estimates of factors and the contributions of each variable to the factors (termed Page | 9 loadings) are all that is required for the analysis. e. Data reduction relies on the factor loadings as well, but uses them as the basis for either (1)identifying variables for subsequent analysis with other techniques or (2)making estimates of the factors themselves (factor scores or summated scales), which then replace the original variables in subsequent analyses. f. The method of calculating and interpreting factor loadings is discussed later. 4.1.3. Variable Selection 1. In both uses of factor analysis, the researcher implicitly specifies the potential dimensions that can be identifies through the character and nature of the variables submitted to factor analysis. For example, in assessing the dimensions of store image, if no question on store personnel were included, factor analysis would not be able to identify this dimension. 2. The researcher also must remember that FA will always produce factors. This factor analysis is always suspect to GIGO. If the researcher includes a large number of variables hoping that FA will figure it out, then the possibility of poor results is high. 3. The quality and meaning of the derived factors reflect the conceptual underpinnings of the variables included in the analysis. 4. Even when used solely for data reduction, FA is most efficient when conceptually defined dimensions can be represented by the derived factors.

4.2. Stage 2: Designing a Factor Analysis


1. Three basic decisions: a. Calculation of the input data (a correlation matrix) to meet the specified objectives of grouping variables or respondents b. Design of the study in terms of number of variables, measurement properties of variables, and the types of allowable variables c. The sample size necessary, both in absolute terms and as a function of the number of variables in the analysis KITNAY VARIABLES MEASURE KSAY HNGAY KIS TYPE K ALLOW KARYNGAY KISKO NAHI KARANAY

4.2.1. Correlations among Variables or Respondents 1. Remember that there are two forms of FA: R-type(VARIABLES) and the Q-type.(Respondent) 2. Both types utilize a correlation matrix as the basic data input.

3. With R-type factor analysis, the researcher would use a traditional correlation matrix (correlations among variables) as input. 4. But the researcher could also elect to derive the correlation matrix from the correlation between individual respondents. 5. In this Q-type FA, the results would be a factor matrix that would identify similar individuals. 4.2.2. Variable Selection and Measurement Issues 1. Two specific questions must be answered at this point: a. What type of variables can be used in factor analysis? b. How many variables should be included? 2. In terms of types of variables included, the primary requirement is that a correlation value can be calculated among all variables. 3. Metric variables are easily measured by several types of correlations. 4. Nonmetric variables are more problematic because they cannot use the same type of correlation measures used by metric variables. 5. Although some specialized methods calculate correlations among nonmetric variables, the most prudent approach is to avoid nonmetric variables. 6. If a nonmetric variable must be included, one approach is to define dummy variables to represent categories of nonmetric variables. 7. If all variables are dummy variables, then specialized forms of FA, such as Boolean factor analysis are more appropriate. 8. The researcher should also attempt to minimize the number of variables included but still maintain a reasonable number of variables per factor. 9. If a study is being designed to assess a proposed structure, the researcher should be sure to include several variables (five or more) that may represent each proposed factor. 10. The strength of FA lies in finding patterns among group of variables, and it is of little use in identifying factors composed of only a single variable. 11. Finally, when designing a study to be factor analyzed, the researcher should, if possible, identify several key variables that closely reflect the hypothesized underlying factors. 12. This identification will aid in validating the derived factors and assessing whether the results have practical significance. 4.2.3. Sample Size 1. Regarding the sample size question, the researcher generally would not factor analyze a sample of fewer than 50 observations, and preferably the sample size should be 100 or larger. 2. As a general rule, the minimum is to have at least five times as many observations as the number of variables to be analyzes and the more acceptable sample size would have a 10:1 ratio. 3. Some researchers even propose a minimum of 20 cases for each variable.(50*20=1000) 4. One must remember, however, that 30 variables, for example, require computing 435 correlations in factor analysis. 5. At a 0.05 significance level, perhaps even 20 of those correlations would be deemed significant and appear in the factor analysis just by chance. 6. The researcher should always try to obtain the highest cases-per-variable ratio to minimize the chances of overfitting the data i.e. deriving factors that are sample-specific with little generalizability. Page | 10

7. In order to do so, the researcher may employ the most parsimonious set of variables, guided by conceptual and practical considerations, and then obtain an adequate sample size for the number of variables examined. 8. When dealing with smaller sample sizes and/or a lower cases-to-variable ratio, the researcher should always interpret any findings cautiously. 9. The issue of sample size will also be addressed in a later section on interpreting factor loadings. Page | 11

4.3. Stage 3: Assumptions in Factor Analysis


1. The critical assumptions underlying factor analysis are more conceptual than statistical. 2. The researcher is always concerned with meeting the statistical requirement for any multivariate technique, but in FA the overriding concerns center as much on the character and composition of variables included in the analysis as on their statistical qualities. 4.3.1. Conceptual Issues 1. The conceptual assumptions underlying factor analysis relate to the set of variables selected and the sample chosen. 2. A basic assumption of FA is that some underlying structure does exist in the set of selected variables. 3. The presence of correlated variables and the subsequent definition of factors do not guarantee relevance, even if they meet the statistical requirements. 4. It is the responsibility of the researcher to ensure that the observed patterns are conceptually valid and appropriate to study with factor analysis, because the technique has no means of determining appropriateness other than the correlations among variables. 5. For example, mixing dependent and independent variables in a single factor analysis and then using the derived factors to support dependence relationships is inappropriate. 6. The researcher must also ensure that the sample is homogenous with respect to the underlying factor structure. 7. It is inappropriate to apply FA to a sample of males and females for a set of items known to differ because of gender. 8. When the two subsamples (male and female) are combined, the resulting correlations and factor structure will be a poor representation of the unique structure of each group. 9. Thus, whenever differing groups are expected in the sample, separate factor analyses should be performed, and the result should be compared to identify differences not reflected in the results of the combined sample. 4.3.2. Statistical Issues 1. Some degree of Multicollinearity is desirable, because the objective is to identify interrelated sets of variables. 2. Assuming the researcher has met the conceptual requirements for the variables included in the analysis, the next step is to ensure that the variables are sufficiently intercorrelated to produce representative factors. 3. As we will see, we can assess this degree of interrelatedness from both overall and individual variable perspectives. 4. The following are several empirical measures to aid in diagnosing the factorability of the correlation matrix.

4.3.2.1. Overall Measures of Intercorrelation 1. The researcher must ensure that the data matrix has sufficient correlations to justify the application of factor analysis. 2. If it is found that all of the correlations are low, or that all of the correlations are equal (denoting that no structure exists to groups variables), then the researcher should question the Page | 12 application of factor analysis. 3. To this end, several techniques are available: a. Partial Correlation b. Bartletts test of sphericity c. Measure of sampling adequacy (MSA) 4.3.2.1.1. Partial Correlation 1. If visual inspection reveals no substantial number of correlations greater than 0.30, the FA is probably inappropriate. 2. The correlations among variables can also be analyzed by computing the partial correlations among variables. 3. A partial correlation is the correlation that is unexplained when the effects of other variables are taken into account. 4. If true factors exist in the data, the partial correlation should be small, because the variable can be explained by the variables loading on the factors. 5. If the partial correlations are high, indicating no underlying factors, then FA is not appropriate. 6. The researcher is looking for a pattern of high partial correlations, denoting a variable not correlated with a large number of other variables in the analysis. 7. The one exception occurs when two variables are highly correlated and have substantially higher loadings than other variables on that factor. Then, their partial correlation may be high because they are not explained to any great extent by the other variables, but do explain each other. This exception is also to be expected when a factor has only two variables loading highly. 8. A rule of the thumb would be to consider partial correlations above 0.7 to be high. 9. SPSS provides the anti-image correlation matrix, which is just the negative value of the partial correlation, whereas BMDP directly provides the partial correlations. 10. In each case, high partial or anti-image correlations are indicative of a data matrix perhaps not suited to FA. 4.3.2.1.2. Bartlett Test of Sphericity 1. Another method analyzes the entire correlation matrix. 2. The Bartlett test is one such measure. 3. It provides the statistical significance that the correlation matrix has significant correlations among at least some of the variables. 4. A statistically significant Bartletts test has a significance of less than 0.05 and indicates that sufficient correlations exist among the variables to proceed. 5. The researcher should not that increasing the sample size causes Bartlett test to become more sensitive in detecting correlations among the variables. 4.3.2.1.3. Measure of Sampling Adequacy (MSA) 1. This index ranges from 0 to 1. 2. It reaches 1 when each variable is perfectly predicted without error by the other variables.

3. The measure can be interpreted with the following guidelines: a. 0.8 or above: meritorious b. 0.7 or above: middling c. 0.6 or above: mediocre d. 0.5 or above: miserable Page | 13 e. Below 0.5: unacceptable 4. MSA increases as: a. Sample size increases b. Average correlations increase c. Number of variables increase d. Number of factors decreases 5. If the MSA value falls below 0.5, then the variable-specific MSA values can identify variables for deletion to achieve an overall value of 0.5 4.3.2.2. Variable-Specific Measures of Intercorrelation 1. We can extend the MSA to evaluate individual variables. 2. Variables with values less than 0.5 should be omitted from the factor analysis one at a time, with the smallest one being omitted each time 3. After deleting every culprit variable, recalculate the value of MSA. 4. Keep deleting until the value of 0.5 is not reached.

4.4. Stage 4: Deriving Factors and Assessing Overall Fit


1. Once the variables are specified and the correlation matrix is prepared, the researcher is ready to apply factor analysis to identify the underlying structure of relationships. 2. In doing so, decisions must be made concerning: a. The method of extracting the factors (common factors vs. component analysis) b. The number of factors selected to represent the underlying structure in the data. 4.4.1. Selecting the Factor Extraction Method 1. The researcher can choose from two similar, yet unique, methods for defining (extracting) the factors to represent the structure of the variables in the analysis. 2. This decision on the method to use must combine the objectives of factor analysis with knowledge about some basic characteristics of the relationships between variables. 3. Before discussing the two methods, a brief introduction to partitioning a variables variance is presented. 4.4.1.1. Partitioning the Variance of a Variable 1. The knowledge on how to partition or divide the variance of a variable is necessary to proceed to selection of the factor extraction technique. 2. Variance is a value (square of standard deviation) that represents the total amount of dispersion of values for a single variable about its mean. 3. When a variable is correlated with another variable, we many times say it shares variance with other variable, and the amount of sharing between the two is simply the squared correlation. 4. For example, if two variables have a correlation of 0.5, each variable shares 2.5 percent (0.5 2) of its variance with the other variable.

5. In factor analysis, we group variables by their correlations, such that variables in a group (factor) have high correlations with each other. 6. Thus, for the purposes of FA, it is important to understand how much of a variable is shared with other variables in that factor versus what cannot be shared (e.g. unexplained). 7. The total variance of any variable can be divided (partitioned) into three types of variances: a. Common: variance that is shared with all other variables in the analysis. This variance is Page | 14 accounted for (shared) based on a variables correlations with all other variables in the analysis. A variables communality is the estimate of its shared, or common, variance among the variables as represented by the derived factors. b. Specific: variance associated with only a specific variable. This variance cannot be explained by the correlations to the other variables but is still associated uniquely with a single variable. c. Error: variance that cannot be explained by correlation with other variables but it is due to unreliability in the data-gathering process, measurement error, or a random component in the measured phenomenon. 8. Thus total variance of any variable is composed of its common, unique, and error variances. 9. As a variable is more highly correlated with one or more variables, the common variance (communality) increases. 10. However, if unreliable measures or other sources of extraneous error variance are introduced, then the amount of possible common variance and the ability to relate the variable to any other variable are reduced. 4.1.1.2. Common Factor Analysis versus Component Analysis 1. With a basic understanding of how variance can be portioned, the researcher is ready to address the differences between the two methods. 2. The selection of one method over the other is based on two criteria: a. The objective of the factor analysis. b. The amount of prior knowledge about the variance in the variables. 3. Component analysis is used when the objective is to summarize most of the original information (variance) in a minimum number of factors for prediction purposes. 4. In contrast, common factor analysis is used primarily to identify underlying factors or dimensions that reflect what the variables share in common. 5. The most direct comparison between the two methods is by their use of explained versus unexplained variance. 6. Component Analysis a. Also known as principal component analysis b. It considers the total variance and derives factors that contain small proportions of unique variance and, in some instances, error variance. c. However, the first few factors do not contain enough unique or error variance to distort the overall factor structure. d. Specifically, with component analysis, unities are inserted in the diagonal of the correlation matrix, so that the full variance is brought into the factor matrix. 7. Common Factor Analysis a. It considers only common or shared variance, assuming that both the unique and error variance are not of interest in defining the structure of the variables.

8.

9.

10.

11. 12.

13. 14. 15. 16.

b. To employ only common variance in the estimation of the factors. Communalities (instead of unities) are inserted in the diagonal. c. Thus, factors resulting from common factor analysis are based only on the common variance. d. Common factor analysis excludes a portion of the variance included in a Component Page | 15 Analysis. How is the researcher to choose between the two methods? a. First, the common factor and component analysis are both widely used. b. As a practical matter, components model is the typical default method of most statistical programs when performing factor analysis. c. Beyond the program defaults, distinct instances indicate which of the two methods is most appropriate. Component factor analysis is most appropriate when: a. Data reduction is a primary concern, focusing on the minimum number of factors needed to account for the maximum portion of the total variance represented in the original set of variables. b. Prior knowledge suggests that specific and error variance represent a relatively small proportion of the total variance. Common factor analysis is the most appropriate when: a. The primary objective is to identify the latent dimensions or constructs represented in the original variables. b. The researcher has little knowledge about the amount of specific and error variance and therefore wishes to eliminate this variance. Common factor analysis, with its more restrictive assumptions and use of only the latent dimensions (shared variance), is often viewed as more theoretically based. Although theoretically sound, however, common factor analysis has several problems. a. First, common factor analysis suffers from factor indeterminacy which means that for any individual respondent, several different factor scores can be calculated from a single factor model result, thus, no single solution is found. b. The second issue involves the calculation of the estimated communalities used to represent the shared variance. Sometimes the communalities are not estimable or may be invalid (e.g. values greater than 1 or less than 0) requiring the deletion of the variable from the analysis. The complications of common factor analysis have contributed to the widespread use of component analysis. Proponents for the common factor models argue otherwise. Although considerable debate remains over which factor model is the more appropriate, empirical research demonstrate similar results in many instances. In most applications, both methods arrive at essentially identical results it the number of variables exceed 30 or the communalities exceed 0.6 for most variables.

4.4.2. Criteria for the Number of Factors to Extract 1. How do we decide on the number of factors to extract? 2. Both methods are interested in the best linear combination of variables best in the sense that the particular combination of original variables accounts for more of the variance in the data as a whole than any other liner combination of variables.

3. Therefore, the first factor may be viewed as the single best summary of linear relationships exhibited in the data. 4. The second factor is defined as the second-best linear combination of the variables, subject to the constraint that it is orthogonal to the first factor. 5. To be orthogonal to the first factor, the second factor must be derived from the variance Page | 16 remaining after the first factor has been extracted. 6. Thus, second factor may be defined as the linear combination of variables that accounts for the most variance that is still unexplained after the effect of the first factor has been removed from the data. 7. The process continues extracting factors accounting for smaller and smaller amounts of variances until all the variance is explained. 8. For example, the components methods actually extracts n factors, where n is the number of variables in the analysis. 9. Thus, if 30 variables are in the analysis, 30 factors are extracted. 10. So, what is gained by factor analysis? 11. Although our example contains 30 factors, a few of the first can represent a substantial portion of the total variance across all the variables. 12. Hopefully, the researcher can retain or use only a small number of the variables and still adequately represent the entire set of variables. 13. That the key question is: how many factors to extract or retain? 14. In deciding when to stop factoring, the researcher must combine a conceptual foundation (how many factors should be in the structure?) with some empirical evidence (how many factors can be reasonably supported?). 15. The researcher generally begins with some predetermined criteria, such as the general number of factors plus some general thresholds of practical relevance (e.g. required percentage of variance explained). 16. These criteria are combined with empirical measures of the factor structure. 17. An exact quantitative basis for deciding the number of factors to extract has not been developed. 18. However, the following stopping criteria for the number of factors to extract are currently being utilized. 4.4.2.1. Latent Root Criterion 1. Most commonly used technique 2. Simple to apply to either components analysis or common factor analysis. 3. Rationale: any individual factor should account for the variance of at least a single variable if it is to be retained for interpretation. 4. With component analysis each variable contributes a value of 1 to the total eigenvalue. 5. Thus, only the factors having latent roots or eigenvalues greater than 1 are considered significant. 6. All factors with latent roots less than 1 are considered insignificant and are disregarded. 7. Using the eigenvalue for establishing a cutoff is most reliable when the number of variables is between 20 and 50. 8. If the number of variables is less than 20, the tendency is for this method to extract a conservative number of factors (too few). 9. If the number of variables is greater than 50, it is not uncommon for too many factors to be extracted.

4.4.2.2. A Priori Criterion 1. The a priori criterion is a simple yet reasonable criterion under certain circumstances. 2. When applying it, the researcher already knows how many factors to extract before undertaking the factor analysis. 3. The researcher simply instructs the computer to stop analysis when the desired number of Page | 17 factors has been extracted. 4. This approach is useful when testing a theory or hypothesis about the number of factors to be extracted. 5. It is also can be justified in attempting to replicate another researchers work and extract the same number of factors that was previously found. 4.4.2.3. Percentage of Variance Criterion 1. The percentage of variance criterion is an approach based on achieving a specified cumulative percentage of total variance extracted by successive factors. 2. The purpose is to ensure practical significance for the desired factors by ensuring that they explain at least a specified amount of variance. 3. No absolute threshold has been adopted for all applications. 4. However, in natural sciences the factoring procedure usually should not be stopped until the extracted factors account for at least 95 percent variance or until the last factor accounts for only a small portion (less than 5 percent). 5. In contrast, in the social sciences, where information is often less precise, it is not uncommon to consider a solution that accounts for 60 percent of the total variance (and in some instances even less) as satisfactory. 6. A variant of this criterion involves selecting enough factors to achieve a prespecified communality for each of the variables. 7. If theoretical or practical reasons require certain communality for each variable, then the researcher will include as many factors as necessary to adequately represent each of the original variables. 8. This approach differs from focusing on just the total amount of variance explained, which neglects the degree of explanation for the individual variables. 4.4.2.4. Scree Test Criterion 1. Recall that with component analysis factor model the later factors extracted contain both common and unique variance. 2. Although all factors contain at least some unique variance, the proportion of unique variance is subsequently higher in later factors. 3. The Scree test is used to identify the optimum number of factors that can be extracted before the amount of unique variance begins to dominate the common variance structure. 4. The Scree test is derived by plotting the latent roots against the number of factors in their order of extraction, and the shape of the resulting curve is used to evaluate the cutoff point.

Page | 18

Figure 3: The Scree Plot

5. The above figure plots the first 18 factors extracted in a study. 6. Starting with the first factor, the plot slopes steeply downward initially and then slowly becomes an approximately horizontal line. 7. The point at which the curve first begins to straighten is considered to indicate the maximum number of factors of extract. 8. In the present curve, the first 10 factors would qualify. 9. Beyond 10, too large a proportion of unique variance would be included; thus these factors would not be acceptable. 10. Note that in using the latent root criterion only 8 factors would have been considered. 4.4.2.5. Heterogeneity of the Respondents 1. Shared variance among variables is the basis for both common and component factor models. 2. An underlying assumption is that shared variance extends across the entire sample. 3. If sample is heterogeneous with regard to at least one subset of the variable then the first factor will represent those variables that are more homogenous across the entire sample. 4. Variables that are better discriminators between the subgroups of the sample will load on later factors, many times those not selected by the criteria discussed previously. 5. When the objective is to identify factors that discriminate among the subgroups of a sample, the researcher should extract additional factors beyond those indicated by the methods just discussed and examine the additional factors ability to discriminate among the groups. 6. If they prove less beneficial in discrimination, the solution can be run again and these later factors eliminated.

4.5. Stage 5: Interpreting the Factors


1. Although no unequivocal processes or guidelines determine the interpretation of factors, the researcher with a strong conceptual foundation for the anticipated structure and its rationale has the greatest chance of success. 2. We cannot state strongly enough the importance of a strong conceptual foundation, whether it Page | 19 comes from prior research, theoretical paradigms, or commonly accepted principles. 3. As we will see, the researcher must repeatedly make subjective judgments in such decisions as to the number of factors, what are sufficient relationships to warrant grouping variables and how can these groupings be identified. 4. As the experienced researcher can attest, almost anything can be uncovered if one tries long and hard enough (e.g. using different factor models, extracting different number of factors, using various forms of rotation). 5. It is therefore left up to the researcher to be the final arbitrator as to the form and appropriateness of a factor solution, and such decisions are best guided by conceptual rather than empirical bases. 6. To assist in the process of interpreting a factor structure and selecting a final factor solution, three fundamental processes are described. 7. Within each process, several substantive issues (factor rotation, factor-loading significance, and factor interpretation) are encountered. 8. Thus, after each process is briefly described, each of these processes will be discussed in more detail. END OF PART 1

You might also like