Data Mining: Bayesian Classifiers
In many real-world applications, the relationship between input attributes
and the class label is not always certain. Even if a test record has the
same attributes as some training records, we cannot always predict the class
label with complete certainty.
This usually happens because of noisy data or because some important factors
affecting theresult are not included in the analysis.
For example, consider predicting whether a person is at risk of liver
disease based on their eating habits and work efficiency. Normally, people
who eat healthy food and exercise regularly have a lower risk of liver
disease. However, some people may still develop the disease due to other
reasons such as frequent consumption of high-calorie street food or
alcohol abuse.
Also, determining whether a person’s diet is truly healthy or whether
their exercise routine is sufficient can itself be difficult to measure
accurately. These uncertainties make the learning and prediction process
more complex.
Bayesian Classification
Bayesian classification is a statistical method used in data mining for
classification. It is based on Bayes’ Theorem, which helps in predicting
the probability of an event based on prior knowledge.
Bayesian classifiers use probability theory to determine the likelihood
that a data record belongs to a particular class.
The concept is based on the work of Thomas Bayes, who introduced a
method using conditional probability to estimate unknown parameters
based on observed evidence.
Bayes’ Theorem
Bayes’ Theorem is mathematically expressed as:
P(X∣Y)=P(Y∣X)×P(X)/P(Y)
Where X and Y are events and P(Y) ≠ 0.
- P(X|Y) – Probability that event X occurs given that Y has occurred (conditional probability)
- P(Y|X) – Probability that event Y occurs given that X has occurred
- P(X) – Probability of event X occurring independently (prior probability)
- P(Y) – Probability of event Y occurring independently (marginal probability)
Bayesian Interpretation
In the Bayesian approach, probability represents a degree of belief about
an event.
Bayes’ Theorem helps update our belief about a hypothesis before and
after observing new evidence.
For example, consider a coin toss.
If we toss a fair coin, the probability of getting heads or tails is
50%. However, if we toss the coin many times and observe the results,
our belief about the probability may increase, decrease, or remain the
same depending on the outcomes.
For a hypothesis X and evidence Y:
P(X) – Prior probability (initial belief about X)
P(X|Y) – Posterior probability (updated belief after observing Y)
The ratio in Bayes’ theorem indicates how strongly the evidence Y supports
the hypothesis X.
Bayesian Network
A Bayesian Network is a type of Probabilistic Graphical Model (PGM) used
to represent uncertain relationships between variables using
probability.
It is also known as a Belief Network.
Bayesian Networks are represented using a Directed Acyclic Graph (DAG).
Directed Acyclic Graph (DAG)
A DAG consists of:
- Nodes – Represent random variables
- Edges (links) – Represent relationships or dependencies between variables
These graphs help model how the probability of one event depends on other
related events.
Conditional Probability in Bayesian Networks
The uncertainty of events in a Bayesian Network is modeled using
Conditional Probability Distributions (CPD).A Conditional Probability
Table (CPT) is used to represent these probabilities for each variable
in the network.The CPT shows the probability of a variable given the
values of its parent variables.