You are on page 1of 8

Winter 2012 Master of Computer Application (MCA) Semester 6 MC0088 Data Mining 4 Credits 1. What is operational intelligence?

e? Operational intelligence (OI) is a category of real-time dynamic, business analytics that delivers visibility and insight into data, streaming events and business operations. Operational Intelligence solutions run queries against streaming data feeds and event data to deliver real-time analytic results. [1] Operational Intelligence provides organizations the ability to make decisions and immediately act on these analytic insights, through manual or automated actions. The purpose of OI is to monitor business activities and identify and detect situations relating to inefficiencies, opportunities, and threats. Some definitions define operational intelligence an event-centric approach to delivering information that empowers people to make better decisions. [2] OI helps quantify:the efficiency of the business activities

how the IT infrastructure and unexpected events affect the business activities (resource bottlenecks, system failures, events external to the company, etc.) how the execution of the business activities contribute to revenue gains or losses.

This is achieved by observing the progress of the business activities and computing several metrics in real-time using these progress events and publishing the metrics to one or more channels (e.g., a dashboard that can display the metrics as charts and graphs, autonomic software that can receive these updates and fine-tune the processes in real-time, email, mobile, and messaging systems that can notify users, and so on). Thresholds can also be placed on these metrics to create notifications or new events.In addition, these metrics act as the starting point for further analysis (drilling down into details, performing root cause analysis tying anomalies to specific transactions and of the business activity). Sophisticated OI systems also provide the ability to associate metadata with metrics, process steps, channels, etc. With this, it becomes easy to get related information, e.g., 'retrieve the contact information of the person that manages the application that executed the step in the business transaction that took 60% more time than the norm," or "view the acceptance/rejection trend for the customer who was denied approval in this transaction," "Launch the application that this process step interacted with." Features Different operational intelligence solutions may use many different technologies and be implemented in different ways. This section lists the common features of an operational intelligence solution:

Real-time monitoring Real-time situation detection Real-time dashboards for different user roles Correlation of events Industry-specific dashboards Multidimensional analysis

o o

Root cause analysis Time Series and trending analysis

Big Data Analytics: Operational Intelligence is well suited to address the inherent challenges of Big Data. Operational Intelligence continuously monitors and analyzes the variety of high velocity, high volume Big Data sources. Often performed in memory, OI platforms and solutions then present the incremental calculations and changes, in real-time, to the end-user.

2. What is Business Intelligence? Explain the components of BI architecture.

Business intelligence (BI) is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information. BI can handle large amounts of information to help identify and develop new opportunities. Making use of new opportunities and implementing an effective strategy can provide a competitive market advantage and long-term stability.[1] BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics. The goal of modern business intelligence deployments is to support better business decision-making. Thus a BI system can be called a decision support system (DSS). [2] Though the term business intelligence is sometimes a synonym for competitive intelligence (because they both support decision making), BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. If understood broadly, business intelligence can include the subset of competitive intelligence. [3]

BI Data Warehouse Source System These are the feeder systems and start point of data flow in the overall BI architecture.

Data Warehouse BI Staging Area Staging area is the place where all transformation, cleansing and enrichment is done before data can flow further.

De-normalized DW- Data Warehouse vs. Data mart Data Warehouse/ Data Mart form the sanitized repository of Data which can be accessed for various purposes.

OLAP Server Layer and capabilities- Why is OLAP needed? OLAP sits between the Data Warehouse and the End-User Tool.

ODS- Operational Data Store Operational Data Store is a centralized repository of Data put on for online operational use.

BI & Data Warehouse- End User Tools End User tools include tools for managing reporting, analytics, Data Mining, Performance and Decision Modeling.

Business Intelligence Metadata Model Metadata is core to BI architecture. It provides a map, a catalogue and reference on data about the business, technical and operational elements of Business Intelligence Components. It spans across Business Metadata, BI technical metadata and Source Systems Metadata.

3. Differentiate between database management systems (DBMS) and data mining. DBMS vs Data Mining A DBMS (Database Management System) is a complete system used for managing digital databases that allows storage of database content, creation/maintenance of data, search and other functionalities. On the other hand, Data Mining is a field in computer science, which deals with the extraction of previously unknown and interesting information from raw data. Usually, the data used as the input for the Data mining process is stored in databases. Users who are inclined toward statistics use Data Mining. They utilize statistical models to look for hidden patterns in data. Data miners are interested in finding useful relationships between different data elements, which is ultimately profitable for businesses. DBMS: DBMS, sometimes just called a database manager, is a collection of computer programs that is dedicated for the management (i.e. organization, storage and retrieval) of all databases that are installed in a system (i.e. hard drive or network). There are different types of Database Management Systems existing in the world, and some of them are designed for the proper management of databases configured for specific purposes. Most popular commercial Database Management Systems are Oracle, DB2 and Microsoft Access. All these products provide means of allocation of different levels of privileges for different users, making it possible for a DBMS to be controlled centrally by a single administrator or to be allocated to several different people. There are four important elements in any Database Management System. They are the modeling language, data structures, query language and mechanism for transactions. The modeling language defines the language of each database hosted in the DBMS. Currently several popular approaches like hierarchal, network, relational and object are in practice. Data structures help organize the data such as individual records, files, fields and their definitions and objects such as visual media. Data query language maintains the security of the database by monitoring login

data, access rights to different users, and protocols to add data to the system. SQL is a popular query language that is used in Relational Database Management Systems. Finally, the mechanism that allows for transactions help concurrency and multiplicity. That mechanism will make sure that the same record will not be modified by multiple users at the same time, thus keeping the data integrity in tact. Additionally, DBMS provide backup and other facilities as well. Data Mining Data mining is also known as Knowledge Discovery in Data (KDD). As mentioned above, it is a felid of computer science, which deals with the extraction of previously unknown and interesting information from raw data. Due to the exponential growth of data, especially in areas such as business, data mining has become very important tool to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades. For example, it is currently been used for various applications such as social network analysis, fraud detection and marketing. Data mining usually deals with following four tasks: clustering, classification, regression, and association. Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data and will typically include following steps: preprocessing of data, designing modeling, learning/feature selection and Evaluation/validation. Regression is finding functions with minimal error to model data. And association is looking for relationships between variables. Data mining is usually used to answer questions like what are the main products that might help to obtain high profit next year in Wal-Mart? What is the difference between DBMS and Data mining? DBMS is a full-fledged system for housing and managing a set of digital databases. However Data Mining is a technique or a concept in computer science, which deals with extracting useful and previously unknown information from raw data. Most of the times, these raw data are stored in very large databases. Therefore Data miners use the existing functionalities of DBMS to handle, manage and even preprocess raw data before and during the Data mining process. However, a DBMS system alone cannot be used to analyze data. But, some DBMS at present have inbuilt data analyzing tools or capabilities. 4. What is Neural Network? Explain in detail.

An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.

Historical background Neural network simulations appear to be a recent development. However, this field was established before the advent of computers, and has survived at least one major setback and several eras. Many importand advances have been boosted by the use of inexpensive computer emulations. Following an initial period of enthusiasm, the field survived a period of frustration and disrepute. During this period when funding and professional support was minimal, important advances were made by relatively few reserchers. These pioneers were able to develop convincing technology which surpassed the limitations identified by Minsky and Papert. Minsky and Papert, published a book (in 1969) in which they summed up a general feeling of frustration (against neural networks) among researchers, and was thus accepted by most without further analysis. Currently, the neural network field enjoys a resurgence of interest and a corresponding increase in funding. The first artificial neuron was produced in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits. But the technology available at that time did not allow them to do too much. Why use neural networks? Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has

been given to analyse. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. Other advantages include: 1. 2. 3. 4. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability. Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage.

Neural networks versus conventional computers Neural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don't exactly know how to do. Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements(neurones) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable. On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions. These instructions are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault. Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency.

5.

What is partition algorithm? Explain with the help of suitable example.

Partition Algorithms These are algorithms for partitions on the set f0; 1; : : : ; n 1g. We represent partitions abstractly as forests, i.e., a collection of trees, one tree for each block of the partition. We only need the parent information about the tree so we represent the partition as a vector V with V.i. the parent of i unless i has no parent (and so is a root), in which case V.i. is negative the size of the block with i. In this scheme the least partition would be represented by the vector h1;1; : : : ;1i and the greatest partition could be represented in many ways including the vector hn; 0; : : : ;0i. [2] contains an elementary discussion of this type of representation of partitions. We say that a vector representing a partition is in normal form if the root of each block is the least element of that block and the parent of each nonroot is its root. This form is unique, i.e., two vectors represent the same partition if and only if they have the same normal form. The examples above are in normal form. Algorithm 1 gives a simple recursive procedure for finding the root of any element i. Note that i and j are in the same block if and only if root.i. . root.j.. 1 procedure root.i;V. 2 j V.i. 3 if j < 0 then return.i. 4 else return.root.j.. endif 5 endprocedure Algorithm 1: Finding the root The running time for root is proportional to the depth of i in its tree, so we would like to keep the depth of the forest small. Algorithm 2 finds the root and at the same time modifies V so that the parent of i is its root without increasing the order of magnitude of the running time. In many applications you want to build up a partition by starting with the least partition and repeatedly join blocks together. Algorithm 3 does this. Note that Algorithm 3 always joins the smaller block onto the larger block. This assures us that the resulting partition will have depth at most log2 n as the next theorem shows. 1 1 procedure root.i;V. 2 j V.i. 3 if j < 0 then return.i. 4 else V.i. root.j.; return.V.i.. endif

5 endprocedure Algorithm 2: Finding the root and compressing V 1 procedure join-blocks.i; j;V. 2 ri root.i;V.; rj root.j;V.; 3 if ri . rj then 4 si V.ri.; sj V.rj. 5 if si < sj then 6 V.i. rj; V.j. .si . sj. 7 else 8 V.j. ri; V.i. .si . sj. 9 endif 10 endif 11 return.V. 12 endprocedure Algorithm 3: Join two blocks together Theorem 1 If Algorithm 3 is applied any number of times starting with the least partition, the depth of the resulting partition will never exceed log2 n. Proof: Let i be a fixed node. Note an application of join-blocks increases the depth of i by at most 1 and, if this occurs, the size of the block with i is at least doubled. Thus the depth of i can be increased (by 1) from its original value of 0 at most log2 n times. This result shows that the time required to run the join-blocksprocedure m times is O.mlog2 n.. In [1] Tarjan has shown that, if we use the root operation given in Algorithm 2, the time required is O.m_.m.., where _ is the pseudo-inverse of the Ackermann function. The Ackermann function is extremely fast growing and so _ grows very slowly; in fact, _.m. _ 4 unless m is at least 222___2 with 65536 2's. By Theorem 1 we may assume that all our (representations of) partitions have depth at most log2 n. The rank of a partition (in the partition lattice _n of an n element set) is nk, where k is the number of blocks. The join of two partitions U and V can be found by executing join-blocks.i;U.i.;V. for each i which is not a root of U. This can be done in time O.rank.U. log2 n. 2 and so in time O.nlog2 n.. (Actually, such an algorithm should make a copy of V so the original V is not modified.) It is relatively easy to write O.nlog2 n. time procedures for putting V into normal form and for testing if V _ U in_n. FindinganO.nlog2 n. time algorithm for the meet of two partitions is a littlemore difficult. Algorithm 4 does this. In this algorithm, HT is a hash table. (In place of a hash table, one could use a balanced tree or some other data structure described in texts on algorithms and data structures.) 1 procedure meet.V1;V2. 2 n size.V1. 3 for i 2 Zwith 0 _ i < n do 4 r1 root.i;V1.; r2 root.i;V2. 5 if HT.r1; r2. is defined then 6 r HT.r1; r2. 7 V.r . V.r . 1 8 V.i. r 9 else 10 HT.r1; r2. i 11 V.i. 1 12 endif 13 endfor 14 return.V. 15 endprocedure Algorithm 4: Meet of two partitions Partition algorithm with an example Hi I am to write the ChoosePivot function in this code below.

void ChoosePivot(dataType A[], int F, int L); // --------------------------------------------------------// Chooses a pivot for quicksorts partition algorithm and // swaps it with the first item in an array. // Precondition: A[F..L] is an array; F <= L. // Postcondition: A[F] is the pivot. // --------------------------------------------------------// Implementation left as an exercise.

void Partition(dataType A[], int F, int L, int& PivotIndex) // --------------------------------------------------------// Partitions an array for quicksort. // Precondition: A[F..L] is an array; F <= L. // Postcondition: Partitions A[F..L] such that: // S1 = A[F..PivotIndex-1] < Pivot // A[PivotIndex] == Pivot // S2 = A[PivotIndex+1..L] >= Pivot // Calls: ChoosePivot and Swap. // --------------------------------------------------------{ ChoosePivot(A, F, L); // place pivot in A[F] dataType Pivot = A[F]; // copy pivot // initially, everything but pivot is in unknown int LastS1 = F; // index of last item in S1 int FirstUnknown = F + 1; // index of first item in // unknown // move one item at a time until unknown region is empty for (; FirstUnknown <= L; ++FirstUnknown) { // Invariant: A[F+1..LastS1] < Pivot // A[LastS1+1..FirstUnknown-1] >= Pivot // move item from unknown to proper region if (A[FirstUnknown] < Pivot) { // item from unknown belongs in S1 ++LastS1; Swap(A[FirstUnknown], A[LastS1]); } // end if // else item from unknown belongs in S2 } // end for // place pivot in proper position and mark its location Swap(A[F], A[LastS1]); PivotIndex = LastS1; } // end Partition void Quicksort(dataType A[], int F, int L) // --------------------------------------------------------// Sorts the items in an array into ascending order. // Precondition: A[F..L] is an array. // Postcondition: A[F..L] is sorted. // Calls: Partition. // --------------------------------------------------------{ int PivotIndex; if (F < L) { // create the partition: S1, Pivot, S2 Partition(A, F, L, PivotIndex); // sort regions S1 and S2 Quicksort(A, F, PivotIndex-1); Quicksort(A, PivotIndex+1, L); } // end if } // end Quicksort

6. Describe the following with respect to Web Mining: a. Categories of WebMining b. Applications of WebMining

There are different ways to mine the web. To structurally analyse the field of tension we need to be able to distinguish between those different forms of web mining. The different ways to mine the web are closely related to the different types of web data. We can distinguish actual data on web pages, web structure data regarding the hyperlink structure within and across web documents, and web log data regarding the users who browsed the web pages. Therefore, in accordance with Madria et al (1999) , we shall divide web mining into three categories. First, there is content mining, to analyse the content data available in web documents. This can include images, audio files etc., however in this study content mining shall only refer to mining text. Second, there is the category of structure mining, which focuses on link information. It aims to analyse the way in which different web documents are linked together. The third category is called usage mining. Usage mining analyses the transaction data that is logged when users interact with the web. Usage mining is sometimes referred to as 'log mining', because it involves mining the web server logs. Structure mining is often more valuable when it is combined with content mining of some kind to interpret the hyperlinks' contents. As content and structure mining also share most of the same advantages and disadvantages (as we shall see later on), we shall discuss them together, considering them as one category. It should however be noted that content and structure mining are not the only ones that can be combined in one tool. Mining content data on a web site can for instance be of added value to usage mining analyses as well . By combining the different categories of web mining in one tool, the results could become more valuable. Web usage mining, however, is quite distinct in its application. As it is also used for different advantages and threatens values in a different way, we shall discuss it separately. The two remaining categories shall be used to see whether the different kinds of web mining will lead to different beneficial or harmful situations. But first the categories will be illustrated by the following scenarios. Content and structure mining Sharon really likes to surf on the web and she loves to read books. On her personal homepage she likes to share information about her hobbies (surfing and reading) and she mentions her membership of a Christian Youth Association. In her list of recommended links she also refers to the web site of the Christian Youth Association. She has included her e-mail address in case someone wants to comment on her homepage. An on-line bookstore decides to use a web mining tool to search the web for personal homepages to identify potential clients. It matches the data provided on homepages to existing customer profiles. After analysing the content and the structure of the mined pages, they discover that people who link to Christian web sites of some kind all show a great interest in reading and generally spend a lot of money on buying books. So if the bookstore then makes a special effort to solicit Christians as customers, it might lead to more buying customers and an increase in profits. The web mining tool, not only provides the bookstore with a list of names, but it also clusters people with the same interests and so on. After analysing the results, Sharon is identified as a potential, high-consuming customer. The bookstore decides to send Sharon a special offer by e-mail. Sharon is somewhat surprised by receiving the e-mail from this bookstore. She has never heard of the store before and she wonders how they could have obtained her e-mail address. A bit annoyed, Sharon deletes the e-mail hoping she will never be bothered by this bookstore again. Usage mining Sharon always goes to her 'own' on-line bookstore. She frequently visits its web site to read about the newest publications and to see if there are any interesting special offers. The on-line bookstore analyses its web server logs and notices the frequent visits of Sharon. By analysing her clickstreams and matching her on-line behaviour with profiles of other customers, it is possible to predict whether or not Sharon might be interested in buying certain books, and how much money she is likely to spend on that. Based on their analyses they decide to make sure that a banner is displayed on her browsing window that refers to a special offer on a newly published book that will most likely be of interest to Sharon. She is indeed appealed by the banner and she follows the hyperlink by clicking on it. She decides to accept the special offer and she clicks on the order button. On the on-line ordering form there are a lot of fields to be filled in, some don't really seem to be relevant, but Sharon does not see any harm in providing the information that is asked for. The people at the bookstore who developed the ordering form intend to use the data for marketing intelligence analyses. In the privacy statement that can be found on the bookstore's web site this intended use of the collected information is explained. The statement also contains a declaration that the gathered information shall not be shared with third parties. However, after a while the on-line bookstore discovers that web users who come from a certain provider, hardly ever buy anything but do cause a lot of traffic load on their server. They decide to close the adaptive part of their web site to visitors who use that certain provider. Sharon happens to be one of them and the banner in her browser window no longer displays special offers when she visits the site of the bookstore. APPLICATIONS An outcome of the excitement about the Web in the past few years has been that Web applications have been developed at a much faster rate in the industry than research in Web related technologies. Many of these are based on the use of Web mining concepts, even though the organizations that developed these applications, and invented the corresponding technologies, did not consider it as such. We describe some of the most successful applications in this section. Clearly, realizing that these applications use Web mining is largely a retrospective exercise. For each application category discussed below, we have selected a prominent representative, purely for exemplary purposes. This in no way implies that all the techniques described were developed by that organization alone. On the contrary, in most cases the successful techniques were developed by a rapid copy and improve approach to each others ideas. 1 Personalized Customer Experience in B2C E-commerce - Amazon. com Early on in the life of Amazon.com, its visionary CEO Jeff Bezos observed, In a traditional (brick-and-mortar) store, the main effort is in getting a customer to the store. Once a customer is in the store they are likely to make a purchase - since the cost of going to another store is high - and thus the marketing budget (focused on getting the customer to the store) is in general much higher than the in-store customer experience budget (which keeps the customer in the store). In the case of an on-line store,

getting in or out requires exactly one click, and thus the main focus must be on customer experience in the store.2 This fundamental observation has been the driving force behind Amazons comprehensive approach to personalized customer experience, based on the mantra a personalized store for every customer [55]. A host of Web mining techniques, e.g. associations between pages visited, click-path analysis, etc., are used to improve the customers experience during a store visit. Knowledge gained from Web mining is the key intelligence behind Amazons features such as instant recommendations, purchase circles, wish-lists, etc. [4]. 2 Web Search - Google Google [30] is one of the most popular and widely used search engines. It provides users access to information from over 2 billion web pages that it has indexed on its server. The quality and quickness of the search facility, makes it the most successful search engine. Earlier search engines concentrated on Web content alone to return the relevant pages to a query. Google was the first to introduce the importance of the link structure in mining the information from the web. PageRank, that measures the 2The truth of this fundamental insight has been borne out by the phenomenon of shopping cart abandonment, which happens frequently in on-line stores, but practically never in a brick-and-mortar one. 60 CHAPTER THREE importance of a page, is the underlying technology in all Google search products, and uses structural information of the Web graph to return high quality results. The Google Toolbar is another service provided by Google that seeks to make search easier and informative by providing additional features such as highlighting the query words on the returned web pages. The full version of the toolbar, if installed, also sends the click-stream information of the user to Google. The usage statistics thus obtained is used by Google to enhance the quality of its results. Google also provides advanced search capabilities to search images and find pages that have been updated within a specific date range. Built on top of Netscapes Open Directory project, 3 Web-wide tracking - DoubleClick Web-wide tracking, i.e. tracking an individual across all sites he visits is one of the most intriguing and controversial technologies. It can provide an understanding of anindividuals lifestyle and habits to a level that is unprecedented, which is clearly oftremendous interest to marketers. A successful example of this is DoubleClick Inc.sDART ad management technology [20]. DoubleClick serves advertisements, whichcan be targeted on demographic or behavioral attributes, to end-user on behalf of the client, i.e. the Web site using DoubleClicks service. Sites that use DoubleClicks service are part of The DoubleClick Network and the browsing behavior of a user canbe tracked across all sites in the network, using a cookie. This makes DoubleClicks ad targeting to be based on very sophisticated criteria. Alexa Research [3] has recruiteda panel of more than 500,000 users, who have voluntarily agreed to have their every click tracked, in return for some freebies. This is achieved through having a browserbar that can be downloaded by the panelist from Alexas website, which gets attached to the browser and sends Alexa a complete click-stream of the panelists Web usage.Alexa was purchased by Amazon for its tracking technology. Clearly Web-wide tracking is a very powerful idea. However, the invasion of privacy 4 UnderstandingWeb communities - AOL One of the biggest successes of America Online (AOL) has been its sizeable and loyal customer base [5]. A large portion of this customer base participates in various AOL communities, which are collections of users with similar interests. In addition to providing a forum for each such community to interact amongst themselves, AOL provides them with useful information and services. Over time these communities have grown to be well-visited waterholes for AOL users with shared interests. Applying Web mining to the data collected from community interactions provides AOL with a very good understanding of its communities, which it has used for targeted marketing through ads and e-mail solicitation. Recently, it has started the concept of community sponsorship, whereby an organization, say Nike, may sponsor a community called Young Athletic TwentySomethings. In return, consumer survey and new product development experts of the sponsoring organization get to participate in the community, perhaps without the knowledge of other participants. The idea is to treat the community as a highly specialized focus group, understand its needs and opinions on new and existing products, and also test strategies for influencing opinions. 5 Understanding auction behavior - eBay As individuals in a society where we have many more things than we need, the allure of exchanging our useless stuff for some cash, no matter how small, is quite powerful. This is evident from the success of flea markets, garage sales and estate sales. The genius of eBays founders was to create an infrastructure that gave this urge aglobal reach, with the convenience of doing it from ones home PC In addition, it popularized auctions as a product selling/buying mechanism, which provides the thrill of gambling without the trouble of having to go to Las Vegas. All of this has madeeBay as one of the most successful businesses of the Internet era. Unfortunately, theanonymity of the Web has also created a significant problem for eBay auctions, as it is impossible to distinguish real bids from fake ones. eBay is now using Web miningtechniques to analyze bidding behavior to determine if a bid is fraudulent Recentefforts are towards understanding participants bidding behaviors/patterns to create a more efficient auction market. 6 Personalized Portal for the Web - MyYahoo Yahoo [75] was the first to introduce the concept of a personalized portal, i.e. a Website designed to have the look-and-feel and content personalized to the needs of an individual end-user. This has been an extremely popular concept and has led to the creation of other personalized portals, e.g. Yodlee [76] for private information, e.g bank and brokerage accounts. Mining MyYahoo usage logs provides Yahoo valuable insight into an individuals Web usage habits, enabling Yahoo to provide personalized content, which in turn has led to the tremendous popularity of the Yahoo Web site.3

You might also like