Data Mining Task Primitives

R Sneha

Data Mining Task Primitives

A data mining task can be described using a data mining query that is given to a data mining system.

This query is defined using data mining task primitives.

These primitives help users interact with the data mining system while discovering patterns in the data. They allow users to guide the mining process and analyze the results from different perspectives or levels.

Data mining task primitives define the following:

The data that needs to be mined
The type of knowledge or patterns to discover
The background knowledge used during mining
The measures used to evaluate interesting patterns
The format used to display the discovered results

A data mining query language can include these primitives so that users can communicate easily with data mining systems. It also helps in building user-friendly graphical interfaces.

Designing a complete data mining language is difficult because data mining includes many tasks such as data description, classification, clustering, and evolution analysis. Each task requires different techniques and processing methods.

Therefore, designing an effective data mining query language requires a strong understanding of the capabilities and limitations of different data mining techniques. This also helps integrate the data mining system with other information systems.

List of Data Mining Task Primitives

A data mining query mainly includes the following primitives.

1. Task-Relevant Data to be Mined

This specifies which part of the database will be used for mining.

It includes:

Selected attributes (columns)
Selected dimensions from a data warehouse
Specific records that match certain conditions

In a relational database, this data is collected using queries that involve operations such as:

Selection
Projection
Join
Aggregation

After retrieving the required data, a new dataset is created called the initial data relation.This dataset can also be sorted or grouped based on the query conditions.Sometimes, this dataset does not physically exist in the database. Instead, it is a virtual table, also called a view.In data mining, this view is known as a minable view, because it contains the data used for mining.

2. Kind of Knowledge to be Mined

This primitive defines what type of patterns or knowledge we want to discover.

Some common data mining tasks include:

Data characterization
Discrimination
Association rule mining
Correlation analysis
Classification
Prediction
Clustering
Outlier detection
Evolution analysis

Each task finds different kinds of patterns in the data.

3. Background Knowledge

Background knowledge is domain knowledge that helps guide the data mining process and improves the quality of discovered patterns.

A common form of background knowledge is a Concept Hierarchy.

Concept Hierarchy

A concept hierarchy organizes data from low-level detailed concepts to higher-level general concepts.

For example:

City → State → Country

Concept hierarchies allow mining at different levels of abstraction.

Operations in Concept Hierarchy

Rolling Up (Generalization)

Converts detailed data into more general information
Makes the data easier to understand
Reduces the amount of data

Example:

Chennai → Tamil Nadu → India

Drilling Down (Specialization)

Moves from general concepts to more detailed information

Example:

Country → State → City

Different users may create different hierarchies for the same attribute depending on their needs.

4. Interestingness Measures and Thresholds

Not all discovered patterns are useful. Therefore, interestingness measures are used to evaluate patterns.

Users can also define threshold values to filter out unimportant patterns.

Common Measures

1. Simplicity

A pattern should be simple and easy to understand.

Complex rules are harder to interpret.
Simpler rules are usually more useful.

Simplicity can be measured based on:

Number of attributes
Number of conditions in the rule

2. Certainty (Confidence)

Confidence measures how reliable a rule is.

For an association rule:

A → B

Confidence is calculated as:

Confidence(A → B) =

(Number of records containing both A and B)

(Number of records containing A)

Higher confidence means the rule is more reliable.

3. Utility (Support)

Support measures how frequently a pattern occurs in the dataset.

Support(A → B) =

(Number of records containing both A and B)

(Total number of records)

High support means the pattern appears frequently.

4. Novelty

A pattern is interesting if it provides new information.

For example:

Unexpected patterns
Rare exceptions
Non-redundant rules

Removing duplicate or similar patterns can help highlight novel patterns.

5. Representation of Discovered Patterns

This primitive specifies how the discovered patterns should be displayed to the user.

Some common representation formats include:

Rules
Tables
Cross-tabulations
Charts (bar chart, pie chart)
Graphs
Decision trees
Data cubes

Users should be able to choose the most suitable visualization method for their analysis.

For example:

Charts and tables are useful for descriptive analysis
Decision trees are commonly used for classification

Example of Data Mining Task Primitives

Suppose you are a marketing manager at AllElectronics and want to classify customers based on their buying behavior.

You are interested in customers who:

Have a salary of at least $40,000
Purchased items worth more than $1,000
Each item costs at least $100

You want to analyze the following customer information:

Age
Income
Type of items purchased
Purchase location
Place where the items were manufactured

Finally, you want to display the results in the form of classification rules.

This type of request can be written as a Data Mining Query Language (DMQL) query like this:

use database AllElectronics_db

use hierarchy location_hierarchy for T.branch, age_hierarchy for C.age

mine classification as promising_customers

in relevance to C.age, C.income, I.type, I.place_made, T.branch

from customer C, item I, transaction T

where I.item_ID = T.item_ID

and C.cust_ID = T.cust_ID

and C.income ≥ 40,000

and I.price ≥ 100

group by T.cust_ID

This query tells the data mining system to classify customers based on their purchasing patterns.

« Previous Next »

Data Mining Task Primitives

Data Mining Task Primitives

List of Data Mining Task Primitives

1. Task-Relevant Data to be Mined

2. Kind of Knowledge to be Mined

3. Background Knowledge

4. Interestingness Measures and Thresholds

Common Measures

1. Simplicity

2. Certainty (Confidence)

3. Utility (Support)

4. Novelty

5. Representation of Discovered Patterns

Example of Data Mining Task Primitives

Translate

Related course

Social Plugin

Ads

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

Data Mining Task Primitives

Data Mining Task Primitives

List of Data Mining Task Primitives

1. Task-Relevant Data to be Mined

2. Kind of Knowledge to be Mined

3. Background Knowledge

4. Interestingness Measures and Thresholds

Common Measures

1. Simplicity

2. Certainty (Confidence)

3. Utility (Support)

4. Novelty

5. Representation of Discovered Patterns

Example of Data Mining Task Primitives

You may like these posts

Footer Copyright

Contact form