Chukwuemeka ID#: 000588080, 000609041, 000573536 Group 21 An Application for Student Recruitment: A GIS Case Study GEOS 459: Applied GIS Capstone Project
1) Introduction SAIT Polytechnic is one of the top post-secondary institutions in Canada dedicated to providing students with a relevant, skill-oriented education. In an effort to continue to be a leading institute, SAITs Comprehensive Institutional Plan (CIP) was developed in 2006. This plan outlines 27 strategic techniques and objectives to be achieved by 2016. Recent statistics predict a decline of 6.2 % in Calgarys population between the ages of 20-34, which represents the majority of SAITs students. This projected shift has motivated SAIT Polytechnics Institutional Planning Department to seek the help of Geographic Information Systems (GIS) to understand where their students come from and their background, to effectively target areas with scarce enrollment. 2) Project Problem Statement The project team wants the Institutional Planning Department (our client) to be able to make sense out of the database they have and understand the distribution of students, foreign and domestic, which would greatly improve their student recruitment and retention approach. Our client has a tabular data of SAIT students, but is unable to analyze student based on their demographics or location, and unable to decipher a trend in the distribution of students around the world, and more specifically, in Calgary. Without an analysis of student demography and location, it would be difficult for the Institutional Planning Department to know what areas to focus on in terms of student recruitment and retention. Hence, the use of GIS technology is necessary to address these problems. Not addressing these issues could potentially lead to a decline in the number of SAIT students enrolled each term, which would subsequently lead to a decrease in generated revenue. With these problems solved, there would be: - A better understanding of the distribution of SAIT students which could be used for further analytical purposes in the future - Increased revenue generation (based on student retention) Project Report Page 2
- A better knowledge of areas where there are less students which would call for more student recruitment in that area, and where there are more students which would call for better student retention strategy. Considering the above mentioned issues, the project team came up with some basic line of action that would be taken to ensure that the problem is solved. They include but are not limited to: - Geocoding student location based on their addresses - Analyzing the distribution of students at the city, province and international level. - Production of maps, graphs, and charts which would enable our client to visually see the result of our analysis. - The creation of an internet web mapping application which would give the client more student-client interaction, and the functionality of adding more student data as needed in the future for further analysis. - Creation of a python script for data-clean up and data export to a GIS environment (a geodatabase). - A model to quickly run distance analysis from an interest point. 3) Project Technical Approach The following paragraph depicts the methodology and steps taken by the group to approach the project problem. Data Collection Retrieved student profile data from Client in excel. Collected the Calgary census data (in the city, provincial, national, and international level) from the Internet in Excel and CSV format. Gathered City of Calgary base feature (Communities, Country boundaries, Rivers, Schools, major cities, et al).
Project Report Page 3
Data Management
Data Conversion and Transformation, which includes; Data format conversion (From a tabular document/file to a feature class). Geocoding SAIT Student Addresses at the International, Provincial and City level. Managing coordinate system through all datasets. Data clean-up including refinement of individual datasets for a specified analytical purpose. Feature classes identification for each datasets. Geodatabase and feature datasets creation to consolidate all converted and transformed datasets. Feature dataset population with the appropriate feature class. Access Database creation to store excel spreadsheets.
Data Validation Resolved unmatched and matched geocoding addresses in the Interactive rematch window in Arcmap, to obtain the highest possible geocoding accuracy. Matched addresses postal code was cross referenced using Google maps to confirm addresses. Geocoded results were compared to client initial data to locate any discrepancy. Calgary socio-demography information was compared to individual community profiles on the Calgary website to discover omissions.
Project Report Page 4
Data Analysis Perform analysis based on the student demographic and City of Calgary data collected. Identifies which maps or data are important for successful completion of each function in the unit. Manipulation of captured data for a desired output.
Data Output From the project goals and objectives, some analysis maps were created using various mapping techniques (such as the choropleth, proportional symbols, proportional symbol pie charts, bivariate mapping techniques). Apart from the creation of maps, other analytical output includes: Statistical tables (including correlation and regression coefficient analysis) Charts Model builder Internet web application (ArcGIS Server) Python scripts Surveys Mapbook 4) Scope Description The project group was assigned the role of developing an innovative and straightforward method to show the distribution of SAIT students in the city, provincial, national and international level. All the properties of these students in terms of gender, age, nationality, residency status, et al, in relation to the City of Calgary demographics data. In light of this, our client provided us with the student database (in Microsoft Excel). This database contains student profile in the city, provincial, national, and Project Report Page 5
international level, in terms of school description, residency status, nationality, and other essential data needed to make this project a success. The project deliverables would be achieved by building a model that would eventually analyze SAIT student profiles, and produce an output in the form of maps, graphs, charts, and other geospatial functions. This output would in turn help to determine the location of SAIT students prior to and after gaining admission into the institution. At the end of this project, the deliverables/geo- spatial applications would be: - Maps - Graphs/charts - Improving SAITs Register office student profile data collection method. - Creation of a central database to easily access student information. - Developing a model to create distance analysis of each student from SAIT - Python script development to automate data clean up, addition to ArcGIS geodatabase and map book creation. - Internet web mapping. 5) Assumption and Constraints Project assumptions and constraints need to be monitored throughout the entire project life cycle, and can be amended or revised at any time if required. Some of our project assumptions are: - All set milestones and deadlines would be met - Project client would easily and readily give out all information needed to effectively complete the project. - All team members will remain on the team for the entire duration of the project. - The City of Calgary community profile database is up to date.
Project constraints are limitations imposed on the project, and the team must work within the boundaries restricted by these constraints. The basic constraints Project Report Page 6
for this project are: the project scope, the project quality, schedule, budget, resources and risk. Out of these six constraints scope, schedule, and budget are more critical, hence special attention needs to be given to them. - Scope: the client expressed a keen interest in the potential of GIS, however the scope of the project had to be streamlined to a particular focus area. - Budget: The team must use data freely available from the City of Calgary and information provided by the client. - Schedule: Team schedule is extremely condensed, therefore each tasks should be completed in the allocated time. The best way to deal with project constraints is to document them properly and create a plan in such a way as to satisfy all project limitations. 6) Tasks Description Streamlining project objectives: defining project scope, requirement and objectives to effectively answer the clients specific demands.
Data collection: collecting excel spreadsheets containing students profiles as well as community profiles from the City of Calgary.
Data Clean up and management: import student profile information into a geodatabase and geocode each profile up till the international city level (if possible).
Data analysis: - Analyze student profile to extract information about gender, nationality, number of applications entered in SAIT, Business stream. - Compare local community profiles (gender, age distribution, nationality) to the population of SAIT students from Calgary.
Project Report Page 7
Data Output: create maps displaying the distribution and characteristics of SAIT students. Develop an internet mapping application that the client will be able to use to add and remove layers to review the maps digitally. 7) Role Description & Responsibilities See resource sheet on the attached MS project file 8) Schedule See attached MS project file 9) Risk Management Risk Impact Avoidance Mitigation 1 The project nature is very transparent. The project nature is simple; therefore there may not be enough complexity to satisfy capstone project requirements Consult with capstone project supervisor to seek professional advices and assistance where needed Incorporating enough geospatial applications and visual aids to project, as much as we can within the project scope 2 Data may require clean-up/adjustments It would have an impact on our set milestones and deadlines This risk may not be avoided, since client is not GIS oriented Informing the client of our data needs before he gives out data 3 File/Database may get corrupted All data would be lost Having a back-up of our database, and having a back-up of that back-up Saving weekly updates of geodatabase in the BGIS O drive folder 4 Data may need to be converted from one file format to another Attributes and fields of a dataset may be lost during conversion Advanced research about proper conversion techniques before carried out Identify the data types that could potentially be lost, and exclude them from the data analysis, then a separate analysis would be run on them 5 Client may not give out data on time It would have an impact on our set milestones and deadlines Reminding client frequently of our data needs and the allotted time for the project completion, through regular e- Preparing/cleaning up the data we already have, exporting them into an already prepared geodatabase, while awaiting the data from Project Report Page 8
mails and phone calls client
10) Performance Measurement/ Quality Plan See attached excel file 11) Analysis Before we began the analysis of the data we received from our client, we had to do an extensive data clean up (removing alpha-numeric characters from the headers, maintaining consistency between upper and lower case characters, et al). We also wrote pythons script which would aid with the data clean-up process and transfer of the data directly into a geodatabase and to create a map book. After which, the data was brought over to the ArcGIS environment which offers lots of functionality and data handling. Moving on to the analysis, the following are some of the analysis that was done: - Nearness of SAIT students within a specified distance (using model builder) to SAIT Polytechnic - The Distribution of Apprenticeship Students in Calgary and in Alberta 2012 - The Distribution of Daytime Students in Calgary 2012 - The Distribution of Enrolled Daytime Students in Calgary 2012 - Distribution of Continuing Education Distribution In Calgary & Alberta 2012 - Median income vs Percentage of SAIT Students in the Calgary communities 2012 - Age group 20-34 vs SAIT Students in the Calgary Communities 2012 The majority of the maps are concentrated on Daytime students because they represent 43 % of SAITs student body. Study area has been centralized on Calgary, where the highest percentage of SAIT students resides. (8 out 10 SAIT students are from Calgary). Project Report Page 9
10) Result Distribution Analysis Results
Daytime Students Total Records of applicants received from client was 20,943. We then geocoded, using this data, to be able to ascertain the location of these applicants in Calgary. After geocoding, the total applicants of 20,943 became 20,887 (99.7% of the total, 20,943, with an error of 0.3%). From the current figure of 20,887, there are 9,673 between the ages of 14 to 64 (43% of 20,887). This does not include the age range from 20 to 34. The applicants between ages 20 to 34 years amounts to 11,214, out of which 9,740 are domestic applicants and 1,473 are international applicants.
Enrolled Students For enrolled students in Canada in the year 2012, there was a total of 3,202 students (ages 20 to 34), from a total of 6,017 students (all age ranges). It is important to note that from the total 3,202 students from ages 20 to 34, 42 of them have been enrolled more than once in a program. Again, from the total 3,202 students, 216 of them (6.7% of 3,202) are international students, while 2,986 of them (93.3% of 3,202) are domestic students. Of the 2,986 domestic students, 2,669 are Canadian citizens, 316 are Canadian Permanent Residents, and 1 is a refugee. For the enrolled domestic students in Canada, 2,743 (91.9% of 2,986) of them were from Alberta, while 243 (8.1% of 2,986) of them were from other provinces. Of the total 2,986 domestic students in Canada, 2,190 (73.3% of 2,986) of them are from Calgary, while 796 (26.7% of 2,986) of them are from other cities within Alberta. Project Report Page 10
Faculties Number Enrolled Students (ages 20-34) Total Number of Students (all ages) Enrolled Percentage (%) Centre For Academic Learner Services
122 287 42.5 MacPhail School of Energy
349 966 36.1 School of Business
424 1086 39.0 School of Construction
370 934 39.6 School of Health Public Safety
252 762 33.1 School of Hospitality &Tourism
126 498 25.3 School of Information & Communication Technology
344 839 41.0 School of Manufacturing & Automation
137 399 34.3 School of Transportation
66 246 26.8
Total
2190
6017
36.4%
2134 = 97.4% students out of the 2190 are the map-able students with Calgary demographics record for each resided community.
Project Report Page 11
Apprentices Students Total of 7411 = 100% records and 7390 = 99.5% out of the total data was captured after Geocoding with an error of 21 = 0.05% (unmatched records). Faculties Number of students Percentage School of Manufacturing & Automation 2060 43% School of Construction 1745 37% School of Transportation 730 15% School of Hospitality &Tourism 122 3% MacPhail School of Energy 73 2% School of Information & Communication Technology 39 1% Total Number of Students 4769
Continuing Education Geocoding Accuracy 100% # of Records 36851 Unmatched records 7
Calgary Alberta Canada World Number of Continuing Education Students 36850 42607 43823 46875 Number of Distant Learning Students 6667 9255 9602 111758 Number of Continuing Education (Practicum and Adult Learning Programs) 30191 33335 33903 35117
Faculty Number of Students Percentage School of Business 11110 30% Centre For Academic Learner Services 7805 21% MacPhail School of Energy 4035 11% School of Construction 3675 10% School of Info & Comm Tech 2987 8% School of Health Public Safety 2426 7% Athletics & Recreation 2137 6% School of Hospitality &Tourism 1347 4% Project Report Page 12
School of Manufacturing & Automation 792 2% Energy: Open Learning Instruct 366 1% School of Transportation 152 0% Corporate Training 26 0% Total Number of students 36851 ** refer to Appendix: Analysis map book to see the respective distribution maps. Least/Common Analysis Results Also as a result of the analysis done, we were able to find out the least common and most common traits of each type of SAIT student. Most Likely Student Characteristics Least Likely Student Characteristics Gender Male Gender Female Gender Count 42614 Gender Count 32536 Age ( at the time of application) 18 Age ( at the time of application) 20 Age Count 2453 Age Count 56 Business Stream Grant Business Stream Earned Business Stream Count 17185 Business Count 3702 School Type School of Construction School Type Centre of Academic Leaner School Type Count 6005 School Type Count 985 Program Code BA Program Code MDT Program Count 1411 Program Count 13 Residency Domestic Residency International Residency Count 18851 Residency Count 2036 City CALGARY City Zaria, Barcelona City Count 13753 City Count 1 Province AB Province ZAPOPAN Province Count 69276 Province Count 1 Country Canada Country Vietnam Country Count 16335 Country Count 1 Citizenship Canadian Citizen Citizenship Refugee Citizenship Count 16302 Citizenship Count 26
Project Report Page 13
Student Type Statistics Breakdown
Apprentices Statistics Number of Students 4769 Average Student Age 27 Most Frequent Age Group 22 Faculty with the Highest Number of Students School of Manufacturing & Automation Number of Students in the Faculty 2060 Course with the Highest Number of Students Electrician/Power Syst 2nd Yr Number of Students in the Course 393 Continuing Education Statistics Number of Students 36858 Average Student Age 31 Most Frequent Age Group 25 Faculty with the Highest Number of Students School of Business Number of Students in the Faculty 2060 Course with the Highest Number of Students Health Care Provider Number of Students in the Course 11110 Daytime Students Statistics Number of Students 4323 Average Student Age 24 Most Frequent Age Group 19 Faculty with the Highest Number of Students School of Business Number of Students in the Faculty 869 Course with the Highest Number of Students Business Administration Number of Students in the Course 524 Project Report Page 14
Median Income vs Daytime SAIT Students Analysis Results Correlation between the two variables is 0.012, showcasing that income is not a driving reason to student enrollment. This comparison in the bivariate map attached to this report. ** refer to Appendix: Analysis map book. Distance Analysis Results Daytime Students Driving Time (in Minutes) Distance (Km) Number of Student Percentage Correlation 6 5 675 16% -0.52 12 10 1373 32% Regression 18 15 1525 36% 0.47 24 20 576 13%
30 25 115 3%
36 30 1 0%
Total Number of Students 4265 Continuing Education Students Driving Time (in Minutes) Distance (Km) Number of Student Percentage Correlation 6 5 8866 24% -0.78 12 10 10755 29% Regression 18 15 11425 31% 0.75 24 20 4438 12%
30 25 1362 4%
36 30 0 0
Total Number of Students 36846
Project Report Page 15
Apprenticeship Students Driving Time (in Minutes) Distance (Km) Number of Student Percentage Correlation 6 5 856 18% -0.60 12 10 1479 31% Regression 18 15 1431 30% 0.59 24 20 782 16% 30 25 221 5% 36 30 0 0 Total Number of Students 4769 All Students Driving Time (in Minutes) Distance (Km) Number of Student Percentage Correlation 6 5 10397 22% -0.69 12 10 13607 29% Regression 18 15 15812 33% 0.66 24 20 5769 12% 30 25 1698 4% 36 30 1 0% Total Number of Students 47284
** Driving distance was calculated as an indicatory measure using the distance buffer. It is assumed that the average driving speed is 50 km/h. As a result of the analysis, our client was able to pin-point areas where there student recruitment effort was not yielding the desired result, areas where student recruitment has not been done in, and areas where there is a need for student retention. There were also able to get a visual representation on how scattered and clustered their students are, and where the majority of them are coming from. This would help them develop a better student recruitment and retention approach which would in turn increase their revenue. Project Report Page 16
11) Conclusion We were also able to come to a conclusion from our analysis that: - Income does not affect student enrollment. - Persons (potential SAIT students) between the ages of 20-34 have a significant impact on student enrollment. - Distance to SAIT Polytechnic significantly affects student enrollment (Majority of students live close to school). 12) Recommendations Given more resource and time it will be beneficial to obtain international world & national world demographic information to further compare these results to SAIT student profiles. Obtaining SAIT Student profiles for other academic years may enable another group of students or the client to compute the growth rate of SAIT students and compare it to socio-demographic information of Calgary communities. It would be beneficial to leverage off the Python scripting portion of the project to automate all data clean up steps and possible automate the creation of Access database from excel files.
Project Report Page 17
13) References (2013). Comprehensive Institutional Plan [PDF File]. Retried on Jan 20 th 2014, available from http://www.sait.ca/Documents/About%20SAIT/Publications/ComprehensiveInstit utionalPlan.pdf Fahad Usmani, (2012), Assumptions and Constraints in Project Management. Available from http://pmstudycircle.com/2012/10/assumptions-and-constraints- in-project-management/ Dataset Alphabetical, Retried on Jan 30 th 2014, available from https://data.calgary.ca/OpenData/Pages/DatasetListingAlphabetical.aspx Community Profiles, Retried on Jan 30 th 2014, available from http://www.calgary.ca/CSPS/CNS/Pages/Research-and-strategy/Community- profiles/Community-Profiles.aspx Esri, Geocoding addresses in a table and rematching unmatched addresses, ArcGIS Help. Retried on March 14 th 2014, available from http://resources.arcgis.com/en/help/main/10.1/index.html#//00250000000m000 000 (2013). Factors Affecting Student enrollment [PDF File]. Retried on March 30 th
2014, available from http://www.aved.gov.bc.ca/ccl_question_scans/documents/1- Enrolment_Patterns.pdf Victor J. Mora, (2004) [PDF File] retrieved on April 1 st 2014, Available from http://onlinelibrary.wiley.com/doi/10.1002/ir.89/pdf