You are on page 1of 44

Segmentation and Profiling using SPSS for Windows

Kate Grayson

Why Segmentation?
Used by e.g. retail and consumer product companies Trying to learn about and describe their customers' buying habits, gender, age, income level, etc. These companies tailor their marketing and product development strategies to each consumer group to increase sales and build brand loyalty. A valuable approach in Market Research, and SPSS offers some useful tools to facilitate this commercial process

Segmentation in SPSS
Most of the techniques for segmentation and profiling are exploratory There is no right or wrong answer, and the results are open to interpretation Trying to make sense of the data or find patterns Iterative techniques If it does not make business sense then it is not a good model!

Segmentation in SPSS
Techniques include:

Factor Analysis / Principal Components Analysis Hierarchical Clustering K-Means Cluster Non-Linear Principal Components Analysis (PRINCALS/CATPCA) The new Two-Step Cluster

Which Technique to Use?


Cluster Analysis Categories

Factor Analysis Exploratory Confirmatory Discriminant Analysis AnswerTree

Which Test to use?


Factor Analysis - to find patterns within variables Categories - use if data doesnt fit assumptions for Factor Analysis Cluster Analysis - to find patterns between individuals Two-Step Cluster To use with both categorical and continuous variables Discriminant Analysis - to look for differences between groups, try to predict target variable AnswerTree - combinations of data, to predict target

Multivariate Analysis
These techniques are inter-related, but dont have to use all of them Can use a combination of these techniques to segment the data

Main Considerations
Looking for patterns or trying to make predictions? Levels of Measurement of the data (categorical or continuous) Sample size Missing values Does data fulfil assumptions for test?

Before you start. .. Check your data!

Handling Missing Data


Check before analysis for any patterns within missing data Check before analysis that missing values are defined as missing - otherwise may compromise the model Be aware that most segmentation techniques ignore any cases with missing values - so may have less usable data than you think!

Variable and Value Labels.


It is worth checking the labels on your file SPSS may truncate long variable and value labels in the output, making it difficult to interpret the output Make sure all the useful information is at the beginning of the variable and value labels - so even if they are truncated, the output is still easy to read

Data Coding
Check the direction of the coding scheme, and maybe consider re-coding the data if the codes are counter-intuitive e.g. if have a rating scale that ranges from high to low, rather than low to high ... it can be difficult to interpret output and factor scores etc. once the data has been through several transformations

0 . 00 1 8. 2 8 .0 1 5. 5 2 .7 1 3 .4 2 5 .9 3 tn ecr e P

7 65 2 la t o T 17 n o- ll oR P A tn i lC 87 2 y ar p S P A tn i lC 04 1 n o- ll oR P A d ar B 14 4 y ar p S P A d ar B 42 6 n o- l l oR PA o bm aR 3 10 1 yar pS PA o bm aR yc ne u qerF

di l a V

Data = usage of underarm deodorants for men Three brands tested: Rambo: the current market leader Brad : second most popular Clint : recently launched product

elpm a s n ihti w y cneuqe rf : e su y ll ausu dn arB

Sample Data

Profiling the Customers..


Clint isnt selling as well as was hoped, so the research aims to find out: Who is buying Clint? What sort of characteristics do they share? Who is buying the other deodorants tested? How might the marketing campaign be changed to ensure that the correct market is targeted?

Data Collected
Ratings of a range of lifestyle attribute questions, e.g. I tend to own the most up-to-date products, My family is most important thing in my life, I prefer to dress and entertain casually etc. (34 of these) Demographics: age, type of work, exercise etc. Brand of D/O usually use How see yourself in relation to others, e.g. What makes you distinctive from your friends

Segmentation the steps


1. Run Principal Components Analysis on attribute rating questions, to see if any underlying dimension in the variables 2. Check using Discriminant Analysis to see if these dimensions help predict brand used 3. Run Cluster Analysis to see if can find similarities between cases 4. Decide if other variables need to be included, e.g. categorical demographics 5. Run Two-Step Cluster using all variables

Factor Analysis

Factor Analysis: what is it?


Looks for relationships between continuous variables (based on correlations), in this case attribute rating questions Derives underlying constructs or dimensions in the data Tries to reduce a large number of variables to a small number of factors which explain most of the variance in the data If cant interpret the resulting solution then no good!

Run Principal Components Analysis on 34 rated attributes

Factor Analysis Results


The best solution produced 9 factors, interpreted below: F1: High computer use F2: Rules, need to conform F3: Party animal F4: Family man F5: Likes new products, experiments F6: Likes pampering, pays more for trusted brands F7: Cautious, follower rather than leader for new products F8: Relaxed, casual F9: Home loving

Do these factors help?


Run Discriminant Analysis to see if can predict D/O used
Combined Groups Plot
Brand usually use
Rambo AP Spray Rambo AP Roll-on Brad AP Spray Brad AP Roll-on Clint AP Spray Clint AP Roll-on Group Centroid

Function 2

Rambo AP Spray
0

Rambo AP Roll-on Brad AP Roll-on Clint AP Spray Clint AP Roll-on Brad AP Spray
-2

-4 -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5

Function 1

Factor Analysis Results


The factors are good at predicting Rambo usage, but not at differentiating between Brad and Clint So try instead investigating relationships between cases using Cluster Analysis Options for clustering are: Hierarchical Cluster K-Means Cluster Two-Step Cluster

Hierarchical Cluster
This is often thought of as the proper cluster method Looking for natural groupings within the data Bases groupings upon the similarity or dissimilarity between cases, rather than variables Very iterative technique time consuming!

Clustering Data - Diagram


= data point: one case

Decisions before Cluster:


Which variables to use? Which distance measures between cases to use? Which criteria for creating clusters to choose? NB The quality of the analysis will always depend upon the variables used Cluster Analysis will always find a solution! It is not possible to assess in the analysis itself how appropriate a variable is

Stages of Hierarchical Cluster:


Select variables for analysis (carefully!) Build and assess model Save cluster membership If required, create cluster matrix for K-Means NB Because based on cases, need to make sure data is measured on same scale - if not, data should be standardized

Run Hierarchical Cluster Analysis on Saved Factor Variables

Decision with D/O Data


I cant get a very good (i.e. useful to the business) model from Hierarchical Cluster analysis Also, I want to be able to include both categorical and continuous variables in the same model So I decide to use Two-Step Cluster instead

Two-Step Cluster

Two-Step Cluster
The TwoStep Cluster Analysis procedure is an exploratory tool designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. The algorithm employed by this procedure has several features that differentiate it from traditional clustering techniques: The ability to create clusters based on both categorical and continuous variables. Automatic selection of the number of clusters. The ability to analyze large data files efficiently.

TwoStep Cluster
Uses scalable cluster analysis algorithm This algorithm can handle both continuous and categorical variables or attributes and requires only one data pass in the procedure The first step of the procedure pre-clusters the records into many small sub-clusters Then it clusters the sub-clusters created in the precluster step into the desired number of clusters If the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters

Two-Step Cluster
y This is unlike other clustering methods in SPSS - if the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters y Or you can pre-specify the number of clusters required - flexibility
y

Run Two-Step Cluster Analysis on Saved Factor Variables and Categorical Variables

Link to more information


More useful information about Two-Step Cluster can be found at the following websites: http://www.rrz.unihamburg.de/RRZ/Software/SPSS/Algorith.120/twost ep_cluster.pdf
NB This was the handout for the talk, with algorithm etc.

Also useful: http://www.spss.com/pdfs/S115AD8-1202A.pdf http://www.norusis.com/pdf/SPC_v13.pdf

Some of the output produced by the Two-Step Cluster Analysis is reproduced in the next few slides

%0.001 %0.001 %0.001 %0.001 %0.001 %0.001 denibmoC

%0. %6.99 %0. %0. %0. %0. 6

%0. %0. %0. %0. %0. %6.92 5

%7.58 %0. %4.69 %0.001 %0. %0. 4 retsulC

%0. %0. %0. %0. %0. %3.25 3

%3.41 %4. %6.3 %0. %7.92 %1.81 2

%0. %0. %0. %0. %3.07 %0. 1

no-lloR PA tnilC yarpS PA tnilC no-lloR PA darB yarpS PA darB no-lloR PA obmaR yarpS PA obmaR

tnecreP esu yllausu dnarB esudnarb

Clint spray seems to be associated with Cluster 6, with the roll-on version being associated with Clusters 4 and 2

Brand usually use by Cluster

%0.001 %0.001 %0.001 %0.001 %0.001 denibmoC

%0. %1.7 %9.01 %0.91 %9.21 6

%6.5 %0. %0. %0. %8.61 5

%3.33 %0.1 %8.9 %4.2 %8.31 4 retsulC

%0. %0. %0. %8.4 %7.92 3

%1.16 %9.19 %3.97 %9.16 %3.2 2

%0. %0. %0. %9.11 %5.42 1

deriteR tnedutS deyolpme toN tnemyolpme emit-traP tnemyolpme emit lluF

tnecreP sutatS tnemyolpmE yolpme

Cluster 2 (Clint roll-on) is largely made up of part-time, retired and not working respondents, Cluster 4 also has a high number of retired respondents, while Cluster 6 Clint spray) also has a high percentage of part-time and unemployed.

Employment Status by Cluster

%0.001 %0.001 %0.001 %0.001 %0.001 %0.001 %0.001 denibmoC

%5.21 %6.11 %2.21 %4.41 %7.21 %3.8 %0. 6

%0. %1.91 %5.93 %0. %0. %6.72 %0. 5

%0. %0. %0. %6.44 %4.74 %0. %0. 3

%8.81 %2.51 %1.31 %2.41 %5.01 %9.6 %2.3 4 retsulC

%8.86 %7.83 %5.5 %8.3 %1.11 %2.75 %8.69 2

%0. %5.51 %7.92 %0.32 %3.81 %0. %0. 1

revo ro 56 46-55 45-54 44-53 43-52 42-81 81 rednU

tnecreP tnednopser fo egA uesrega

Cluster 2 (Clint roll-on) is largely made up of the younger and older age groups, Cluster 4 also has a high percentage of older respondents. Cluster 6 is more from 25 years upwards

Age Group by Cluster

TwoStep Cluster Number = 4

Cluster 4 (Clint roll-on) has below average computer use and need to conform, above average on Home Loving & Family Man

F1: High computer use F8: Relaxed, casual F2: Rules, need to conform F9: Home loving F4: Family man F7: Cautious, follower rather than leader for new products F5: Likes new products, experiments F6: Likes pampering, pays more for trusted brands F3: Party animal

Variable

-30

-20

-10

TwoStep Cluster Number = 6

Cluster 6 (Clint spray) has above average scores on Relaxed, Casual but not much else this is Mr Laid Back!

F1: High computer use F2: Rules, need to conform F8: Relaxed, casual F7: Cautious, follower rather than leader for new products F4: Family man

Variable

F6: Likes pampering, pays more for trusted brands F9: Home loving F5: Likes new products, experiments F3: Party animal

-40

-20

20

40

Summary of Findings
Profiling of this data suggests that Clint is not targeting the expected market Clint is often not seen as sufficiently different from Brad, it has no perceived USP Clint is being used by a high percentage of older, retired, and part-time or not employed consumers, which may be a result of the aggressive product launch campaign with free samples, discounted prices etc. Clint marketing needs some more work!

Summary of Segmenting and Profiling this data using SPSS


Principal Components Analysis helped investigate relationships between the rated attribute variables Hierarchical Cluster was used to try and find similarities between cases, using the factors derived from PCA Two-Step Cluster was then used to enable clustering of both continuous and categorical variables in the same model Useful conclusions were drawn about the market positioning of Clint deodorant

You might also like