Pandas - Cleaning Empty Cells

Dhanapriya D

Empty Cells

Empty cells also known as missing value can lead to incorrect results or errors during data analysis. It's important to deal with them before moving forward.

Remove Rows with Empty Cells

A simple and common way to handle missing data is by removing rows that contain any empty cells.

This approach is acceptable when:

The dataset is large, and
The number of missing entries is relatively small
In such cases, dropping a few rows won’t significantly affect the overall analysis.

Program

Remove Rows with Missing Data

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())

This will return a new DataFrame (new_df) that excludes any rows with missing values.

By default, the dropna() method does not modify the original DataFrame. Instead, it returns a new one.

If you want to remove rows with missing values directly from the original DataFrame, you need to use the inplace=True argument.

Program:

Drop Missing Values In-Place

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace=True)

print(df.to_string())

Now, dropna(inplace=True) updates the original df and does not return a new DataFrame.

Replacing Empty Values

Instead of removing rows with missing data, you can replace empty cells with a specific value. This approach helps preserve the rest of the dataset, which might still be valuable.

Pandas provides the fillna() method to replace missing values with a chosen value.

Example

Replace all missing (NULL) values with the number 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace=True)

This ensures that no data is lost due to empty cells, making your dataset more complete and easier to work with.

Replace Missing Values in Specific Columns

If you only want to replace empty cells in a particular column, you don’t need to apply changes to the entire DataFrame. Instead, you can target specific columns by providing the column name.

Pandas allows this using the fillna() method with a dictionary.

Example

Replace missing values in the "Calories" column with the number 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna({"Calories": 130}, inplace=True)

This method ensures that only the "Calories" column is affected, leaving the rest of the DataFrame unchanged.

Replace Missing Values with Mean, Median, or Mode

A common technique to handle missing data is to replace empty cells with the mean, median, or mode of the column. This approach helps maintain the integrity of the data while minimizing distortion.

Pandas provides built-in methods to calculate these statistical values:

mean() – average value
median() – middle value
mode() – most frequent value

Example

Replace missing values in the "Calories" column with the mean

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df.fillna({"Calories": x}, inplace=True)

You can follow the same approach with median() or mode()[0] depending on your data needs.

Mean = the average value (the sum of all values divided by number of values).
Median = the value in the middle, after you have sorted all values ascending.
Mode = the value that appears most frequently.

Pandas - Cleaning Empty Cells

Empty Cells

Remove Rows with Empty Cells

Program

Program:

Replacing Empty Values

Example

Replace all missing (NULL) values with the number 130

Replace Missing Values in Specific Columns

Example

Replace missing values in the "Calories" column with the number 130

Replace Missing Values with Mean, Median, or Mode

Replace missing values in the "Calories" column with the mean

Translate

Related course

Social Plugin

Ads

Website by

Categories

Our Services

Footer Copyright

Contact form

Pandas - Cleaning Empty Cells

Empty Cells

Remove Rows with Empty Cells

Program

Program:

Replacing Empty Values

Example

Replace all missing (NULL) values with the number 130

Replace Missing Values in Specific Columns

Example

Replace missing values in the "Calories" column with the number 130

Replace Missing Values with Mean, Median, or Mode

Replace missing values in the "Calories" column with the mean

You may like these posts

Footer Copyright

Contact form