Table of Contents
ToggleMissing data is a common problem in data science domain. When we are collecting raw data from the environment we get lot of missing values.
In Pandas missing values are typically represented as nan (Not a Number). Dropping rows with nan values helps us to maintain the integrity of our analysis by ensuring that only complete records are included. This blog will discuss method to drop rows with nan in Pandas.
How `dropna()` method helps us to drop rows with nan values in Pandas?
The dropna() method in Pandas helps us to to remove rows or columns from a dataframe that contains missing values.It drops any row that contains at least one NaN value in dataframe or series by default. This approach is beneficial for ensuring that statistical calculations and data visualizations are based on complete data.
How to use dropna() method in pandas ?
First we will create a dataframe called df with data as dictionary named ’employee_data’. The dictionary contains keys like Name, Age, City and salary
import pandas as pd import numpy as np # Creating a DataFrame with NaN values employee_data= { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, np.nan, 35, 40], 'City': ['New York', 'San Francisco', 'LA', np.nan], 'Salary': [5000, 6000, 5500, 6200] } df = pd.DataFrame(data) # Dropping rows with any NaN values cleaned_df = df.dropna() print(cleaned_df)Output
Name Age City Salary Alice 25.0 New York 5000 Charlie 35.0 LA 5500 David 40.0 NaN 6200In the above example the row containing Bob was removed because it had a NaN value in the “Age” column.
Parameters of `dropna()` method in Pandas.
When we are using the dropna() function we have to know these parameters.
- axis : The axis parameter determines whether to drop rows or columns. axis=0 or axis=’index’ : Means drop rows. This is the default value. axis=1 or axis=’columns’: Drop columns.
- how : The how parameter helps us to specify the condition for dropping. `how=’any’`: Drop if any NaN values are present.This is the default value. `how=’all’`: Drop only if all values in a row or column are NaN.
- thresh : thresh parameter basically mean minimum threshold required in term of non-NaN values to keep the row/column.
- subset : Using this parameter we can specify particular columns in which we interest to check for NaN values.
- inplace : If we set the value of the ‘inplace’ parameter to `True` then we get modified original DataFrame instead of returning a new one.
Dropping Rows Where All Values are NaN using how parameter in dropna()
To drop only those rows where all values are nan, you can use:
import pandas as pd # Sample DataFrame data = { 'A': [1, None, 3, None], 'B': [None, None, 4, None], 'C': [None, None, None, None] } df = pd.DataFrame(data) # Dropping rows where all values are NaN df_cleaned = df.dropna(how='all') print(df_cleaned)Output
A B C 1.0 NaN None 3.0 4.0 None
Drop rows with NaN Based on specific columns using dropna() in pandas.
We can also specify which columns to check for nan values using the `subset` parameter:
import pandas as pd # Sample DataFrame data = {'A': [1, 2, None, 4, 5], 'B': [None, 2, 3, 4, None], 'C': [1, 2, 3, 4, 5]} df = pd.DataFrame(data) # Drop rows where NaN appears in columns 'A' or 'B' df.dropna(subset=['A', 'B'], inplace=True) print(df)Output
A B C 2.0 2.0 2 4.0 4.0 4The above code will drop rows that have NaN in either the “A” or “B” columns but will keep rows with missing values in other columns like “C”.
Drop row with nan based on threshold parameter in Pandas.
We will keep the value of thresh=2 means that a row must have atleast 2 non- NaN values for retaining it.
import pandas as pd # Sample DataFrame data = {'A': [1, 2, None, 4, 5], 'B': [None, 2, 3, None, None], 'C': [1, None, 3, None, 5]} df = pd.DataFrame(data) # Drop rows that have less than 2 non-NaN values df.dropna(axis=0, thresh=2, inplace=True) print(df)Output
A B C 0 1.0 NaN 1.0 1 2.0 2.0 NaN 2 NaN 3.0 3.0 4 5.0 NaN 5.0The row at index 3 was eliminated.
Drop row with Nan values using axis parameter in dropna()
import pandas as pd data = {'A': [1, 2, None, 4, 5], 'B': [None, 2, 3, 4, None], 'C': [1, 2, 3, 4, 5]} df = pd.DataFrame(data) df.dropna(axis=0, inplace=True) #axis=0 means drop from rows having nan values #axis=1 means drop from columns having nan values print(df)
Output
A B C 1 2.0 2.0 2 3 4.0 4.0 4
Conclusion
We can drop rows with nan values using ‘dropna()’ method in Pandas.The dropna() method is important for cleaning dataset by removing rows or columns with missing values. Removing nan values is important process for preparing data for data analysis.
You can also learn about:
Pingback: [Solved] Notimplementederror: cannot copy out of meta tensor; no data!
Pingback: Understanding the ValueError: can only compare identically-labeled series objects in Pandas