You are on page 1of 15

GISC9216 Deliverable 1 Introduction to Supervised Classification

Shannon Graup

1/28/2014

Shannon Graup
47 Notley Place Toronto Ontario M4B 2M7 647-921-4930 shannon.graup@gmail.com

January 28th, 2014 GISC 9216-D1 Janet Finlay Program Coordinator Niagara College 135 Taylor Road Niagara-on-the-Lake, ON L0S 1J0

Dear Ms. Finlay, Re: GISC9216-D1 Introduction to Supervised Classification Please accept this letter as my formal submission of deliverable 1, Introduction to Supervised Classification for GISC9216 Digital Image Processing. The deliverable purpose was to complete an unsupervised and a supervised classification on a Niagara Region Landsat subset. Once the classifications were completed, an analysis and comparison was completed, identifying the different advantages and disadvantages of both types of classifications. If there are any technical issues regarding the deliverable .img files or you have any questions regarding the assignment submission please feel free to contact me by phone (647-921-4930) or e-mail at your convenience. I look forward to receiving your feedback and suggestions. Sincerely,

Shannon Graup BAH GIS GM Candidate S.G/s.g Enclosures 1) GISC9231 Deliverable 1 Introduction to Supervised Classification

Shannon Graup shannon.graup@gmail.com

Table of Contents
List of Figures ............................................................................................................................ 1 List of Appendices...................................................................................................................... 1 1.0 2.0 Introduction ..................................................................................................................... 2 Methodology .................................................................................................................... 2

2.1 Unsupervised Classification .............................................................................................. 2 2.2 Supervised Classification .................................................................................................. 2 3.0 Discussion ....................................................................................................................... 3

3.1 Frequency Histogram ....................................................................................................... 3 3.2 Supervised Classification Methods ................................................................................... 3 3.3 Supervised vs. Unsupervised Classification...................................................................... 7 3.4 Comparison between Classification Output Images .......................................................... 8 4.0 Conclusion .....................................................................................................................10

Bibliography ..............................................................................................................................10

List of Figures
Figure 1: Water Frequency Histogram, Band 1 .......................................................................... 3 Figure 2: Forest Frequency Histogram, Band 3 .......................................................................... 3 Figure 3: Minimum Distance Supervised Classification, Barrie, Ontario Region ......................... 4 Figure 4: Maximum Likelihood Classification, Barrie, Ontario Region ......................................... 5 Figure 5: Mahalanobis Distance Classification, Barrie, Ontario Region ...................................... 6 Figure 6: Unsupervised Classification subset ............................................................................. 8 Figure 7: Supervised Classification subset ................................................................................. 8 Figure 8: Barrie, Ontario Region Supervised Classification ........................................................ 9 Figure 9: Barrie, Ontario Region Unsupervised Classification .................................................... 9 Figure 10: Barrie, Ontario Region subset .................................................................................. 9 Figure 11: Barrie, Ontario Region Supervised Classification ...................................................... 9 Figure 12: Barrie, Ontario Region Unsupervised Classification .................................................. 9

List of Appendices
Appendix 1: Barrie, Ontario Region Minimum Distance Supervised Classification ....................11 Appendix 2: Barrie, Ontario Region Unsupervised Classification ..............................................12 Appendix 3: Barrie, Ontario Region Subset ...............................................................................13

Page 1 of 14

Shannon Graup shannon.graup@gmail.com

1.0 Introduction
During this deliverable image classification was completed on an image subset of the Niagara Region. An unsupervised and supervised classification was completed on a subset of the initial image. Lillesand, Kiefer, & Chipman state the overall objective of an image classification as automatically classifying all pixels in an image into separate themes or land cover classes (2008). The image pixels are classified based on their DN value (brightness value). In an unsupervised classification the user inputs the number of desired classes and the software will class image pixels together based on brightness values. An unsupervised classification is useful if the user is unsure of what types of land cover the image contains. Unsupervised classification groups pixels based on their spectral classes (Lillesand, Kiefer, & Chipman, 2008). In a supervised classification the user must create a signature file that contains training sites for the desired classes. A supervised classification introduces human error because if an incorrect training site is chosen, the image pixels could be classified incorrectly. For example if the user collects training site pixels from an agricultural area and labels this as the forest class, all pixels with a brightness resembling the agricultural area brightness will be classified as forest. A supervised classification is best suited when the user has a good idea of land cover shown in the image.

2.0 Methodology
The initial imagine file was 7470 by 7092 pixels. Before the image classification was completed a subset image of 512 x 512 pixels was created in ERDAS of the Barrie, Ontario Region. Once the subset image was created an unsupervised and supervised classification was completed on the image.

2.1 Unsupervised Classification


An unsupervised classification was completed on the subset image. In ERDAS the user must input a number of parameters for an unsupervised classification including number of classes and max iterations. For this unsupervised classification 10 classes were chosen and the output was recoded into seven classes (unsuper10recode.img).

2.2 Supervised Classification


A supervised classification was completed on the subset image. The first step in completing a supervised classification is to create a signature file. A signature file contains training sites of each class contained in the image subset. The number of classes and training site locations are chosen by the user. For a supervised classification the user must choose a classification of either minimum distance, maximum likelihood or mahalanobis distance. In this scenario seven different classes were chosen, each with a number of training sites throughout the image. The minimum distance classification was used to create the output file (supermindistance.img).

Page 2 of 14

Shannon Graup shannon.graup@gmail.com

3.0 Discussion
3.1 Frequency Histogram
Figure 1 and Figure 2 show brightness value frequency histograms from the signature file used in the supervised classifications for Figure 3, Figure 4, Figure 5, and Appendix 1. The x-axis of the histogram represents the range of data vales (DN values or brightness values), while the yaxis represents the frequency of data values occurring. Figure 1 and Figure 2 Error! Reference source not found.show examples of two histograms from the signature file used in the supervised classification images. Figure 1 shows the histogram of the water class in band one while Figure 2 shows the histogram of the forest class in band 3. Both of these histograms show a normal distribution of data values, this indicates that the training sites for these two classes were appropriate.

Figure 1: Water Frequency Histogram, Band 1

Figure 2: Forest Frequency Histogram, Band 3

3.2 Supervised Classification Methods


When completing the supervised classification on the Barrie, Ontario Region subset, three outputs were created each with a different classification method. The three different classification methods used were minimum distance, maximum likelihood and mahalanobis distance. Figure 3, Figure 4, and Figure 5 are supervised classification output images each with seven classes; agriculture, bareground agriculture, commercial, forest, residential, roads, and water. The supervised classification image that was chosen is show in Appendix 1: Barrie, Ontario Region Minimum Distance Supervised Classification. The classification method used is the minimum distance classification method. This was the best classification of the image subset. Forest and agricultural areas were classified correctly, and the agricultural bareground areas were classified well. The roads class was misclassified in some areas that were actually bareground agriculture or bareground. The reason for this misclassification could be due to the fact that the mean DN value for the training data of the roads class is close to the unknown pixel values of bareground agriculture and bareground pixels. Page 3 of 14

Shannon Graup shannon.graup@gmail.com

Figure 3: Minimum Distance Supervised Classification, Barrie, Ontario Region

Minimum distance classification determines the mean DN value (pixel brightness value) of each class in the image. Unknown pixels in the image are assigned to the class whose mean is most similar to the DN value of the unknown pixel (Wacker & Landgrebe, 1972). An advantage to minimum distance is that no pixels will be left unclassified. A disadvantage of minimum distance is that class variability is not considered. Classes such as urban which usually have areas of largely mixed pixels may be misclassified. Figure 3 shows the minimum distance supervised classification of the Barrie, Ontario Region. One class that was classified poorly was the roads class; many pixels that are in reality bareground or bareground agriculture were classified as

Page 4 of 14

Shannon Graup shannon.graup@gmail.com roads. The mean DN values of roads, bareground and bareground agriculture can be similar which could be the reason for the misclassification.

Figure 4: Maximum Likelihood Classification, Barrie, Ontario Region

Maximum likelihood classification looks at the probability that an unknown pixel belongs to a particular class based on Gaussian probability (Ahmad, 2012). This classification assumes that each class has a normal (Gaussian) distribution and that probabilities are equal for all classes (Ahmad, 2012). Figure 4 shows the maximum likelihood classification for the Barrie, Ontario Region. In this image it can be seen that the roads class was again, poorly classified, having many bareground areas classified as roads. Agricultural areas were also misclassified, as there is much more agricultural land cover in the maximum likelihood classification method than in reality. These misclassifications can point to poorly chosen training areas, or the need for more Page 5 of 14

Shannon Graup shannon.graup@gmail.com classes to better distinguish between these different classes. The maximum likelihood classification classifies the commercial and residential areas well, having clearly defined areas of both classes in the urban region surrounding the water body.

Figure 5: Mahalanobis Distance Classification, Barrie, Ontario Region

Mahalanobis distance classification is very similar to minimum distance classification but a covariance matrix is used (Perumal & Bhaskaran, 2010). This classification method takes into account data correlations (Durak, 2011). Figure 5 shows the mahalanobis distance supervised classification for Barrie, Ontario Region. The agriculture class was classified very poorly in the mahalanobis classification method. Many areas that are in reality forest were misclassified as agricultural areas. Pixel DN values for the forest and agricultural classes are similar, which could be the reason for the misclassification. Commercial and Residential areas were classified Page 6 of 14

Shannon Graup shannon.graup@gmail.com well in this classification method, they can be identified clearly around the water body in Barrie. This classification method output could be improved by better chosen training data sites.

3.3 Supervised vs. Unsupervised Classification


Completing an unsupervised and supervised classification on an image can be beneficial for different reasons. A false colour subset can be found in Appendix 3: Barrie, Ontario Region Subset. Often an unsupervised classification is completed on an image of unknown land cover classes. An unsupervised classification does not use training data to classify an image, instead it clusters groups based on the natural groupings of pixel brightness values (Lillesand, Kiefer, & Chipman, 2008). If the user is unsure of what kinds of land cover classes lie in the image, a supervised classification could result in the creation of training sites that are incorrect. The creation of a signature file in a supervised classification opens the process up to human error. If incorrect training pixels are chosen to represent the wrong class, the entire pixel classification can yield incorrect results. If the land cover of an image is known a supervised classification will often yield better results than an unsupervised classification because some land cover classes have similar pixel DN values. If this is the case, an unsupervised classification will not be able to differentiate the difference where in a supervised classification specific training sites can be chosen to better guide the classification. As mentioned in the background, during an unsupervised classification the user must input the desired number of bands in the output image. The user must also enter the number of maximum iterations the classification pass will take and the convergence threshold. The maximum iterations number indicates to the ERDAS software how many iterations or times to pass through the data to classify and recluster the pixels. If maximum iterations was not included in user options, the classification could get stuck in a cycle and never create an output. The convergence threshold indicates to the software the maximum percentage of pixels that must go unchanged between each iteration. For this classification a convergence threshold of 0.95 was chosen, meaning that 95% of pixels in a cluster must go unchanged between iterations. During a supervised classification the user must create a signature file that contains numerous training data sites of each desired class for the output image. The supervised classification then uses one of the classification methods as discussed above; minimum distance, maximum likelihood and mahalanobis distance. A supervised classification can have user error if training data is chosen incorrectly. Having incorrect pixels chosen for training data for one class can cause the classification method to then group more incorrect pixels to that class. The unsupervised output image shown in Appendix 2: Barrie, Ontario Region Unsupervised Classification had 10 classes, 10 max iterations, and a convergence threshold of 0.95. The output image was then recoded into six classes; water, forest, agriculture, residential, bareground, and commercial. The unsupervised classification has clearly defined forest and agricultural areas. An unsupervised classification groups classes based on their spectral similarity; forested areas and agricultural areas have distinct spectral signatures which is why

Page 7 of 14

Shannon Graup shannon.graup@gmail.com The supervised classification output image shown in Appendix 1: Barrie, Ontario Region Minimum Distance Supervised Classification has a total of seven classes; water, forest, agriculture, bareground agriculture, residential, commercial and roads. Training sites for each of the seven classes were chosen and put into a signature file. The supervised classification groups classes based on the similarity of pixel DN values to the pixel DN values of each class training data. The classification method used in the output image shown in Appendix 1 is the minimum distance classification method. In minimum distance classification method the mean pixel DN values for training data is used to classify unknown pixels. Unknown pixels are grouped based on how close the DN value is to the mean of training data pixels DN values.

3.4 Comparison between Classification Output Images


The main downfall of the unsupervised classification output image shown in Appendix 2 is that the roads are not identified correctly. Some roads are classified as agricultural areas, while others are grouped in the residential class. Figure 6 shows part of the urban area from the Barrie, Ontario Region subset. It can be seen in Figure 6 that the roads have been classified into the residential class. Figure 7 shows the same area on the supervised classification image output. The roads have been separated from the residential feature class.

Figure 6: Unsupervised Classification subset

Page 8 of 14

Shannon Graup shannon.graup@gmail.com

Figure 7: Supervised Classification subset

The supervised classification output image shown in Appendix 1 has the extra roads class added, but many of the areas were misclassified as roads when they are in reality bareground or agriculture bareground. This indicates that the training data sites chosen for the bareground agriculture class did not cover the range of pixel DN values that they should have. If more training data sites were chosen for the agriculture bareground class that were more indicative of the areas in the subset image, less bareground agriculture areas might have been classified as roads. Both the supervised and unsupervised classification output images classified the large water body areas correctly. The pixel DN values for the water class does not vary greatly. The outputs of the large water areas can be seen in Figure 8 and Figure 9.

Figure 8: Barrie, Ontario Region Supervised Classification

Figure 9: Barrie, Ontario Region Unsupervised Classification

While the large water areas were classified correctly in both the supervised and unsupervised images, some smaller water areas were not classified as well in the unsupervised output image

Page 9 of 14

Shannon Graup shannon.graup@gmail.com

as they were in the supervised output image. Figure 10 shows the false colour subset of the Barrie, Ontario area with some smaller water areas. Figure 11 shows the same area taken from the supervised classification output image and Figure 12 shows the area from the unsupervised classification output image. The water areas are clearly depicted in the supervised classification output image, while they are difficult to see in the unsupervised classification output image.

Figure 10: Barrie, Ontario Region subset

Figure 11: Barrie, Ontario Region Supervised Classification

Figure 12: Barrie, Ontario Region Unsupervised Classification

Overall the minimum distance supervised classification output image land cover classes were more clearly depicted than in the unsupervised classification output image. The roads class was separated from the agriculture and residential classes and the smaller water areas were identified clearly.

4.0 Conclusion
During this deliverable a subset of Barrie, Ontario was created, and from this subset an unsupervised and supervised classification was completed. After a comparison of the two different classifications and the different supervised classification methods it was determined that the minimum distance supervised classification method best classified the Barrie, Ontario Region.

Bibliography
Ahmad, A. (2012). Analysis of Maximum Likelihood Classification on Multispectral Data. Applied Mathematical Sciences, 6(129), 6425 - 6436. Durak, B. (2011). A classification algorithm using mahalanobis distance clustering of data with applications on biomedical data sets. Thesis submitted to The Graduate School of Natural and Applied Science of Middle East Technical University. Lillesand, T. M., Kiefer, R. W., & Chipman, J. W. (2008). Remote Sensing and Image Interpretation (Sixth Edition ed.). Daryagani, New Delhi: John Wiley & Sons. Perumal, K., & Bhaskaran, R. (2010, February). Supervised ClassificationPerformance of Multispectral Images. Journal of Computing, 2(2), 124 - 129. Wacker, A. G., & Landgrebe, D. A. (1972). Minimum Distance Classification in Remote Sensing. Laboratory for applications of Remote Sensing Technical reports. Page 10 of 14

Shannon Graup shannon.graup@gmail.com


Appendix 1: Barrie, Ontario Region Minimum Distance Supervised Classification

Page 11 of 14

Shannon Graup shannon.graup@gmail.com


Appendix 2: Barrie, Ontario Region Unsupervised Classification

Page 12 of 14

Shannon Graup shannon.graup@gmail.com


Appendix 3: Barrie, Ontario Region Subset

Page 13 of 14

You might also like