What is Data Mining?

Summary by James R. Martin

Electronic Commerce Main Page | AIS MIS Main Page

Data mining is part of a group of concepts or techniques related to business intelligence, or e-business intelligence. Data mining involves obtaining information from a variety of sources that is stored in a data warehouse. This information becomes the input for various applications that uncover relationships and trends related to customers and processes. Online analytical processing (OLAP) allows a user to view data from many different angles to uncover correlations and relationships, somewhat like a Rubik's cube (Roberts-Witt, Gold diggers 2001). These results are then used by managers and others to make better decisions. The emphasis is on data sharing where the web allows various types of information to be accessible to the masses. Managers, customers, suppliers and partners can ask the data warehouse questions about various aspects of the business through query and reporting applications. The illustration below provides a graphic view of the data mining concept.

 

Three important considerations include: clean data, security and scalability (Roberts-Witt, Gold diggers 2001). Where thousands of people are accessing a system, the data must be accurate and free of errors and inconsistencies. Special attention must be given to establishing who has access rights to the data and to enforcing those rights. The infrastructure must be in place including web servers, report servers, databases and networks to support scalability. 

Examples

Starbucks uses data mining to reduce insurance claims. The data is analyzed to uncover locations, floor designs and time patterns where customers slip and fall more frequently from coffee spills. (Roberts-Witt, Gold diggers 2001). 

Dow Jones Interactive Wall Street Journal uses data mining to better understand how the site is performing by correlating the log and click-stream information generated with the customer files. (Roberts-Witt, Data mining: What lies beneath? 2002). 

The Royal Dutch/Shell Group operates in 135 countries with 90,000 employees and 1,700 separate operating companies. Shell's data mining project allows the company to find more meaning in its' data to help negotiate better contracts and identify products that are doing well or declining on a global basis. (Roberts-Witt, Data mining: What lies beneath? 2002). 

Harrah's mines its' rich database to develop compelling customer incentives. (Loveman, Diamonds in the data mine, 2003).

Some other applications of data mining include: 1) Simulating and optimizing supply chain flows, reducing inventory and stock-outs, 2) Identifying customers with the greatest profit potential, 3) Identifying the price that will maximize yield or profit, 4) Selecting the best employees for tasks or jobs, 5) Detecting and minimizing quality problems, 6) Proving a better understanding of the drivers of financial performance including nonfinancial factors, 7) Improving the quality, efficacy and safety of products and services. (Davenport, Competing on analytics 2006).

From the intelligence perspective, the National Research Council ranked data mining technology with antibiotics, vaccines, imaging and other technologies in the fight against terrorism. Text mining, video mining, audio phone mining and e-mail mining could all become important in the area of homeland defense (Roberts-Witt, Data mining: What lies beneath? 2002).

There are four paradigms of science: 1) Theory, 2) Experimentation, 3) Computation and simulation, and 4) Data mining. The next scientific revolution involves using the fourth paradigm, deep-data-mining tools to solve the worlds problems in astronomy, oceanography, healthcare, water management, and climate change. (Hey, The next scientific revolution 2010).

________________________________________________

References and some additional articles and books on Data Mining

Berry, M. J. A. and G. S. Linoff. 2004. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley Computer Publishing.

Bramer, M. 2007. Principles of Data Mining (Undergraduate Topics in Computer Science. Springer.

Calderon, T. G., J. J. Cheh and I. Kim. 2003. How large corporations use data mining to create value. Management Accounting Quarterly (Winter): 1-11.

Davenport, T. H. 2006. Competing on analytics. Harvard Business Review (January): 98-107.

Davenport, T. H. and J. G. Harris. 2007. Competing on Analytics: The New Science of Winning. Harvard Business School Press. 

Davenport, T. H., J. G. Harris and Robert Morison. 2010. Analytics at Work: Smarter Decisions, Better Results. Harvard Business Press. 

Debreceny, R. and G. L. Gray. 2004. Grab your picks and shovels! There's gold in your data. Strategic Finance (January): 24-28.

Fisher, I. E., M. R. Garnsey, S. Goel and K. Tam. 2010. The role of text analytics and information retrieval in the accounting domain. Journal of Emerging Technologies in Accounting (7): 1-24. (Abstract)

Han, J. and M. Kamber. 2006. Data Mining Concepts and Techniques. Morgan Kaufmann Publishers.

Hastie, T., R. Tibshirani and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition. Springer.

Hey, T. 2010. The next scientific revolution. Harvard Business Review (November): 56-63.

Janert, P. K. 2010. Data Analysis with Open Source Tools. O'Reilly Media.

Kovalerchuk, B., E. Vityaev and R. Holtfreter. 2007. Correlation of complex evidence in forensic accounting using data mining. Journal of Forensic Accounting 8(1-2): 53-88.

Larose, D. T. 2004. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley-Interscience.

Liu, B. 2007 and 2010. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer.

Loveman, G. 2003. Diamonds in the data mine. Harvard Business Review (May): 109-123.

Markov, Z. and D. T. Larose. 2007. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley-Interscience.

Matignon, R. 2007. Data Mining Using SAS Enterprise Miner. Wiley-Interscience.

May, T. 2009. The New Know: Innovation Powered by Analytics. Wiley.

McCue, C. 2007. Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis. Butterworth-Heinemann.

Milton, M. 2009. Head First Data Analysis: A Learner's Guide to Big Numbers, Statistics, and Good Decisions. O'Reilly Media.

Nisbet, R., J. Eder IV and G. Miner. 2009. Handbook of Statistical Analysis and Data Mining Applications. Academic Press.

Padmanabhan, B. and A. Tuzhilin. 2002. Knowledge refinement based on the discovery of unexpected patterns in data mining. Decision Support Systems 33(3): 309-321.

Padmanabhan, B. and A. Tuzhilin. 2003. On the use of optimization for data mining: Theoretical interactions and eCRM opportunities. Management Science (October): 1327-1343.

Redman, T. C. 2008. Data Driven: Profit from Your Most Important Business Asset. Harvard Business School Press. 

Roberts-Witt, S. L. 2001. Gold diggers: Let customers and partners mine your data using new e-business intelligence tools. It could turn into a gold rush. PC Magazine (February, 20): ibiz 6-ibiz 10.

Roberts-Witt, S. L. 2002. Data mining: What lies beneath? Finding patterns in customer behavior can deliver profitable insights into your business. PC Magazine (November, 19): iBiz 1-6. 

Shirata, C. Y. and M. Sakagami. 2008. An analysis of the “going concern assumption”: Text mining from Japanese financial reports. Journal of Emerging Technologies in Accounting (5): 1-16.

Tan, P., M. Steinbach and V. Kumar. 2005. Introduction to Data Mining. Addison Wesley.

Torgo, L. 2010. Data Mining with R: Learning with Case Studies. Chapman and Hall/CRC.

Tsiptsis, K. and A. Chorianopoulos. 2010. Data Mining Techniques in CRM: Inside Customer Segmentation. Wiley.

Wang, J. and J. G. S. Yang. 2009. Data mining techniques for auditing attest function and fraud detection. Journal of Forensic & Investigative Accounting 1(1): 1-24.

Williams, S. 2011. 5 Barriers to BI success and how to overcome them. Strategic Finance (July): 26-33. (Note).

Witten, I. H. and E. Frank. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman.

Witten, I. H. and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufman.

Zheng, Z., B. Padmanabhan and S. Kimbrough. 2003. On the existence and significance of data preprocessing biases in web usage mining. INFORMS Journal on Computing 15(2): 148-170.