History of Data Mining
The term Data Mining became popular in the 1990s, but the ideas behind it
have existed for many years.
Data mining developed from earlier methods used to analyze
and understand data.
In the 1700s, early statistical concepts such as Bayes’ Theorem were
introduced. Later, in the 1800s, methods like regression analysis were developed to study
relationships between variables. As computers became more powerful, it became easier to
collect, store, and process large amounts of data. This growth in computing power allowed researchers
to develop more advanced techniques such as neural networks, clustering, genetic
algorithms (1950s), decision trees (1960s), and support vector machines (1990s).
The development of data mining mainly comes from three major fields:
Classical Statistics, Artificial Intelligence, and Machine Learning.
1. Classical Statistics
Classical statistics forms the foundation of many data mining techniques.
It provides mathematical methods to analyze and interpret data. Some common
statistical techniques used in data mining include regression analysis, standard deviation, variance,
cluster analysis, discriminant analysis, and confidence intervals. These methods help in
understanding patterns and relationships within data.
2. Artificial Intelligence (AI)
Artificial Intelligence focuses on creating systems that can mimic human
thinking and decision-making. Unlike traditional statistics, AI often uses heuristics
(rule-based approaches) to solve problems. AI techniques have been used in many computer systems,
such as query optimization in Relational Database Management Systems (RDBMS), to
improve performance and decision-making.
3. Machine Learning
Machine Learning combines ideas from both statistics and artificial
intelligence. It can be seen as an advanced stage of AI where computers learn from data and improve
their performance over time. Machine learning algorithms analyze data, identify patterns,
and help systems make decisions automatically. These algorithms use statistical concepts along
with AI techniques to understand the characteristics of data and produce accurate
results.