Data Mining Bayesian Classifiers
gocourse.in Maintenance

We'll be back soon

Our CDN (cdn.gocourse.in) is currently unreachable. Some images, JavaScript, or CSS files may not load properly.

Estimated downtime: ~30 minutes

Data Mining Bayesian Classifiers

Vinithra

Data Mining: Bayesian Classifiers

In many real-world applications, the relationship between input attributes and the class label is not always certain. Even if a test record has the same attributes as some training records, we cannot always predict the class label with complete certainty.

This usually happens because of noisy data or because some important factors affecting theresult are not included in the analysis.

For example, consider predicting whether a person is at risk of liver disease based on their eating habits and work efficiency. Normally, people who eat healthy food and exercise regularly have a lower risk of liver disease. However, some people may still develop the disease due to other reasons such as frequent consumption of high-calorie street food or alcohol abuse.

Also, determining whether a person’s diet is truly healthy or whether their exercise routine is sufficient can itself be difficult to measure accurately. These uncertainties make the learning and prediction process more complex.

Bayesian Classification

Bayesian classification is a statistical method used in data mining for classification. It is based on Bayes’ Theorem, which helps in predicting the probability of an event based on prior knowledge.

Bayesian classifiers use probability theory to determine the likelihood that a data record belongs to a particular class.

The concept is based on the work of Thomas Bayes, who introduced a method using conditional probability to estimate unknown parameters based on observed evidence.

Bayes’ Theorem

Bayes’ Theorem is mathematically expressed as:
P(X∣Y)=P(Y∣X)×P(X)/P(Y)

Where X and Y are events and P(Y) ≠ 0.
  • P(X|Y) – Probability that event X occurs given that Y has occurred (conditional probability)
  • P(Y|X) – Probability that event Y occurs given that X has occurred
  • P(X) – Probability of event X occurring independently (prior probability)
  • P(Y) – Probability of event Y occurring independently (marginal probability)

Bayesian Interpretation

In the Bayesian approach, probability represents a degree of belief about an event.

Bayes’ Theorem helps update our belief about a hypothesis before and after observing new evidence.

For example, consider a coin toss.
If we toss a fair coin, the probability of getting heads or tails is 50%. However, if we toss the coin many times and observe the results, our belief about the probability may increase, decrease, or remain the same depending on the outcomes.

For a hypothesis X and evidence Y:
P(X) – Prior probability (initial belief about X)
P(X|Y) – Posterior probability (updated belief after observing Y)

The ratio in Bayes’ theorem indicates how strongly the evidence Y supports the hypothesis X.

Bayesian Network

A Bayesian Network is a type of Probabilistic Graphical Model (PGM) used to represent uncertain relationships between variables using probability.

It is also known as a Belief Network.

Bayesian Networks are represented using a Directed Acyclic Graph (DAG).

Directed Acyclic Graph (DAG)

A DAG consists of:
  • Nodes – Represent random variables
  • Edges (links) – Represent relationships or dependencies between variables
These graphs help model how the probability of one event depends on other related events.

Conditional Probability in Bayesian Networks

The uncertainty of events in a Bayesian Network is modeled using Conditional Probability Distributions (CPD).A Conditional Probability Table (CPT) is used to represent these probabilities for each variable in the network.The CPT shows the probability of a variable given the values of its parent variables.
Our website uses cookies to enhance your experience. Learn More
Accept !