You are on page 1of 20

CSC 4740 / 6740 Fall

2016
Data Mining
Instructor: Yubao
FallWu
2016

Welcome!
Instructor: Yubao Wu
Office: 25 Park Place Suite 737
Phone: 404-413-6125 (office)
E-mail: ywu@cs.gsu.edu
Website: http://www.robwu.net/teaching
Office Hours: 4:00 pm - 5:30 pm,
Wednesday;
3:30 pm - 5:00 pm, Friday; or
by appointment

Classroom and Date


Classroom: Petit Science
Center 230
Date/Time: Monday/Wednesday, 10:00 am
- 11:45 am

Textbook
Data Mining: Concepts
and Techniques, Third
Edition,
by Jiawei Han, Micheline
Kamber, and Jian Pei,
Morgan Kaufmann
Publishers, 2011.
ISBN:9780123814791

References
Introduction to Data Mining, by Tan,
Steinbach, and Kumar, Addison Wesley, 2006.
(ISBN:0-321-32136-7)
Principles of Data Mining, by Hand, Mannila,
and Smyth, MIT Press, 2001. (ISBN:0-26208290-X)
The Elements of Statistical Learning --- Data
Mining, Inference, and Prediction, by Hastie,
Tibshirani, and Friedman, Springer, 2001.
(ISBN:0-387-95284-5)

Course
Content

Basic data mining


techniques

association rules mining


Sequential Patterns
Classification and
Prediction
Clustering and Outlier
Detection
Regression
Pattern Interestingness
Dimensionality Reduction

Big data mining


applications

Web data mining


Bioinformatics
Social networks
Text mining
Visualization
Financial data
analysis
Software
Engineering

Course
Requirements
Prerequisite: CSC 3410 Data Structures
The department will strictly enforce all
prerequisites. Students without proper
prerequisites will be dropped from the class,
without any prior notice, at any time during the
Course
semester.
Requirements:

Assignments
Basic theoretical
Mid-term
principles
Practical hands-on
Exam
Final Exam
experience
Research
Project

Assignments and
Exams
Mid-Term Exam: Open
Textbook
Final Exam: Open
Textbook
The problems for CSC 4740 and CSC 6740 may
be different.

Research Projects
CSC
4740:
One or Two undergraduate students form
a group.
Each group does a project and submits one
project report.
CSC
6740:
Each graduate student does a project and
submits one project report.

Research Projects
discovers interesting relationships within a
significant amount of data.
Some project ideas (only examples, best to
propose your own)
Statistical Computing (Speed up traditional
statistical methods, such as correlation
computation).
Data Mining in Business Applications (Customer
Segmentation, Accounting, Marketing)
Literature Survey
Mining Biological Datasets
Social Network Analysis
Your own ideas

Research Projects
Project proposal (2 - 4 pages, ACM SIGKDD or IEEE
ICDM template)
Title, project idea, survey of related work, data
source, key algorithms/technology, and what you
expect to submit at the end of the semester.
Final report (6 - 12 pages , ACM SIGKDD or IEEE
ICDM template)
A comprehensive description of your project.
project idea, extended survey of related work,
detailed algorithm/technology, specific
implementation, key results
what worked, what did not work, what surprised
you, and why

Research Projects
CSC
4740:
Project Proposal
Final Report
Software, user manual, and sample
dataset
CSC
6740:
Project Proposal
Final Report
Software, user manual, and sample
dataset
Slides

Research Projects
Final presentation
In the last a few classes, each graduate
student presents his/her project to the rest of
the class.
About 15 minute presentation + 2 minute
questions
Checkpoints
Proposal (due Sep 21): ~ 1 month
Final Report (due Dec 5): ~ 2 months

Class Policy:
Attendance: Students are required to attend all
classes.
Academic honesty: Plagiarism will result in a
score of zero on the test or project. The instructor
has the right to make a decision.
Assignments and Projects: They must be
handed in on time and will not be accepted
when past due.
Withdrawals: Oct 11 Tuesday is the last day to
withdraw and possibly receive a W.
Make-ups: need the instructor's special
permission.

Grading Policy:

Mid-term
Exam
Final Exam
Assignment
s
Project
Attendance

CSC
4740

CSC
6740

25%

20%

25%

20%

30%

25%

15%
5%

30%
5%

A+ [97,
A [93,
100]
97)
B+ [87,
B [83,
90)
87)
C+ [77,
C [73,
80)
77)
D [60, 70) F [0, 60)

A- [90,
93)
B- [80,
83)
C- [70,
73)

If one students score is no less than 97, an A+


will
given.
The be
scores
may be adjusted if the average is
low.

Tentative Course Outline and


Schedule:
Chapter 1 Introduction
Aug. 22
Chapter 2 Getting to Know Your Data
Chapter 3 Data Preprocessing
Chapter 6 Mining Frequent Patterns,
Associations, and Correlations: Basic
Concepts and Methods

Aug. 24
Aug. 29, 31, Sep. 7

Chapter 8 Classification: Basic Concepts Sep. 12


Chapter 9 Classification: Advanced
Methods

Sep. 14, 19, 21

Project Proposal Due

6 pm eastern time,
Sep. 21

Tentative Course Outline and


Schedule:
Chapter
10 Cluster Analysis: Basic
Sep. 26, 28, Oct. 5,
Concepts and Methods
Mid-term Exam

10
Oct. 3

Chapter 11 Advanced Cluster Analysis

Oct. 12, 17, 19, 24

Chapter 13 Data Mining Trends and


Research Frontiers

Oct. 26, 31, Nov. 2, 7,


9, 14

Project Presentations

Nov. 16, 28, 30

Final Exam

Dec. 5
6 pm eastern time,
Dec. 8

Research Project Due

KDD References
Data mining and KDD
Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD,
PAKDD, etc.
Journal: ACM-KDD, Data Mining and Knowledge
Discovery, KDD Explorations
Database systems
Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEEICDE, EDBT, ICDT, DASFAA
Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc.
AI & Machine Learning
Conferences: Machine learning (ICML), AAAI, IJCAI, COLT
(Learning Theory), etc.
Journals: Machine Learning, Artificial Intelligence, etc.

KDD References
Statistics
Conferences: Joint Stat. Meeting, etc.
Journals: Annals of statistics, etc.
Bioinformatics
Conferences: ISMB, RECOMB, PSB, CSB, BIBE,
etc.
Journals: J. of Computational Biology,
Bioinformatics, PLoS Computational Biology,
etc.

Questions
?

You might also like