Pandas - Analyzing DataFrames

Dhanapriya D

Viewing Data in a DataFrame

When working with large datasets, it's essential to quickly preview the data. One of the most commonly used methods in Pandas for this purpose is the head() method.

The head() method displays the column headers along with a specified number of rows from the beginning of the DataFrame.

Program

Display the First 10 Rows

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))

In this example, we're using a CSV file named data.csv. You can either download this file or open it in your browser to follow along.

Note: If no number is specified, head() will return the first 5 rows by default.

Program

Display the First 5 Rows 

import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())

In addition to head(), Pandas also provides the tail() method to preview data from the end of the DataFrame.

The tail() method works similarly and displays the column headers and the specified number of rows from the bottom.

Program 

Display the Last 5 Rows

print(df.tail())

Inspecting Data with info()

To gain a quick overview of your dataset’s structure, Pandas provides a built-in method called info() that displays essential metadata about the DataFrame.

Progarm 

View Basic Information

print(df.info())

Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):

#    Column Non-Null Count Dtype
0     Duration 169 non-null int64
1     Pulse   169 non-null int64
2     Maxpulse 169 non-null int64
3     Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None

Result Explained

The result tells us there are 169 rows and 4 columns:

RangeIndex: 169 entries, 0 to 168

Data columns (total 4 columns):

And the name of each column, with the data type:

#  Column  Non-Null  Count  Dtype
0   Duration 169  non-null int64
1   Pulse      169   non-null int64
2   Maxpulse 169 non-null int64
3   Calories  164   non-null float64

Dealing with Null Values

Null or missing values can lead to inaccurate analysis if not handled properly. In this dataset, the "Calories" column has 5 rows without values. This is something you should address during the data cleaning process—an essential step in preparing your data for analysis.




   

Our website uses cookies to enhance your experience. Learn More
Accept !

GocourseAI

close
send