You are on page 1of 23

DATA MINING

The Basic Concept


CONTENTS

• Introduction to Data Mining


• Architecture of Data Mining
• Data Mining: On What Kind of Data
• Data Warehouse.
• Application of Data Mining.
• Benefits of Data Mining:
• Disadvantages of Data Mining
INTRODUCTION TO DATA
MINING
• Definition of data mining
Data mining refers to extracting
knowledge from large amount
of data.
Or
Extraction of useful knowledge from
large amount of data
DATA + MINING
DATA:
facts and statistics collected
together for reference or analysis
MINING:
the process of obtaining required
objected from raw.
EXAMPLE
• ALTERNATIVE NAME
Commonly used for data mining is knowledge
discovery in database (KDD).

JUST LIKE
• Gold mining
• Coal Mining

• AMAZON.COM
ARCHITECTURE

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-
Database or data warehouse base
server
Data cleaning & data integration Filtering

Data
Databases
Warehouse
DATA SOURCES
• Database, data warehouse, World Wide Web
(WWW), text files and other documents are the
actual sources of data.

• You need large volumes of historical data for data


mining to be successful. Organizations usually store
data in databases or data warehouses.

• Data warehouses may contain one or more


databases, text files, spread sheets or other kinds of
information repositories. World Wide Web or the
Internet is another big source of data.
DIFFERENT PROCESSES
• Database or Data Warehouse Server
It contains the actual data that is ready to be processed.
Hence, the server is responsible for retrieving the
relevant data based on the data mining request of the user.
• Data Mining Engine
Is the core component of any data mining system.
It consists of a number of modules for performing data
mining tasks
• Pattern Evaluation Modules
The pattern evaluation module is mainly responsible for
the measure of interestingness of the pattern .
PROCESSES
• Graphical User Interface
The graphical user interface module communicates
between the user and the data mining system. This
module helps the user use the system easily and
efficiently without knowing the real complexity
behind the process. 
• Knowledge Base
The knowledge base is helpful in the whole data
mining process. It might be useful for guiding the
search or evaluating the interestingness of the result
patterns
Data Mining: On What Kinds of
Data?

 Relational database

 Data warehouse

 Advanced database and information repository

 Multimedia database

 Text databases & WWW

10
DATA WAREHOUSE

• Repository of multiple data sources organized


at a single site under a unified schema.

• Data warehouse processes include:


• Data cleaning

• Data integration

• On-line analytical processing (OLAP)


Data Mining: A KDD Process
Knowledge
 Data mining—core of
knowledge discovery Pattern Evaluation
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
12
CHARACTERISTICS OF DATA
WAREHOUSE
• Subject Oriented – Data warehouse is
subject oriented because it provides us the
information around a subject rather than on
operations
• Integrated – Data warehouse is
constructed by integration of data from
heterogeneous sources such as relational
database, etc. the integration enhances the
effective analysis of data.
CHARACTERISTICS OF DATA
WAREHOUSE
• Time Variant – Data is identified with a
particular time period. Data provides information
from a historical point of view.

• Non-Volatile – Non-volatile means the previous


data is not removed when a new data is add to it
APPLICATIONS OF DATA
MINING

• Market • Telecommunicatio

n
analysis
• Intelligence
• Health care
MARKET ANALYSIS AND
MANAGEMENT
• Where does data come from?
• Credit card transactions, customer complaints
calls.
• Target Marketing
• Find clusters of customer who share the same
characteristics: interest, income level, spending
habits, etc.
• Customer profiling
• What types of customers buy what products
DATA MINING APPLICATIONS
FOR HEALTHCARE
• The pharmaceutical industry produces a large amount of documents
that are often underutilized. Data mining can improve health
systems and reduce costs:
• Provide government, regulatory and competitor information that
can fuel competitive advantage.
• Support to the R&D process and the go-to-market strategy with
rapid access to information at every phase of the development process.
• Discover the relationships between diseases and the
effectiveness of treatments to identify new drugs, or to ensure that
patients receive appropriate, timely care.
• Support healthcare insurers in detecting fraud and abuse
DATA MINING APPLICATIONS
FOR TELECOMMUNICATION
• The large volumes of call, customer and network data generated and
stored by telecommunications companies require data mining to
extract hidden knowledge and identify useful data to better
understand customers and detect fraud:
• Gain a competitive advantage and reduce customer churn by
understanding demographic characteristics and predicting customer
behaviour.
• Increase customer loyalty and improve profitability by providing
customized services.
• Support customer segmentation strategy by developing
appropriate marketing campaigns and pricing strategies.
DATA MINING APPLICATIONS
FOR INTELLIGENCE
• Data mining helps analyse data and clearly identifies how
to connect the dots among different data elements. This is an
essential aspect for government agencies:
• Reveal hidden data related to money laundering, narcotics
trafficking, corporate fraud, terrorism, etc.
• Improve intrusion detection with a high focus on anomaly
detection and identify suspicious activity from a day one.
• Convert text based crime reports into word processing files
that can be used to support the crime-matching process.
BENEFITS
• It is helpful to predict future trends:
One of the common benefits that can be derived with
these data mining systems is that they can be helpful
while predicting future trends. And that is quite possible
with the help of technology and behavioural changes
adopted by the people.
• It signifies customer habits:
For example, while working in the marketing industry
one can understand all the matters of customer
behaviour and their habits. And that is possible with the
help of data mining systems.
It is helpful in keeping the track of customer habits and
their behaviour.
BENIFITS
• Helps in decision making:
By using data mining we get the knowledge of
customer and the relevant data so we can easily decide
the future plan.
• Increase company revenue:
As people can collect information about the marketed
products online, which eventually reduces the cost of
the product and their services.
• It depends upon market-based analysis:
Data mining process is a system where in which all the
information has been gathered on the basis of market
information.
DISADVANTAGES
• It violates user privacy:
Data mining collects information about people using
some market-based techniques and information
technology.
Data mining system violates the privacy of its user and
that is why it lacks in the matters of safety and security
of its users. Eventually, it creates Miss-communication
between people.
• Additional irrelevant information:
As a result of data mining we get our required
information but sometimes it also produce additional and
irrelevant information.
DISADVANTAGES

• Misuse of information:
As the possibility of safety and security measure are
really minimal.
So some can misuse this information to harm others in
their own way.
• An accuracy of data:
Nowadays the process of information collection made
things easy with the mining technology .
One of the most possible limitations of this data mining system
is that it can provide accuracy of data with its own limits.

You might also like