Drop rows with NaN : Pandas

Drop rows with nan

Missing data is a common problem in data science domain. When we are collecting raw data from the environment we get lot of missing values.

In Pandas missing values are typically represented as nan (Not a Number). Dropping rows with nan values helps us to maintain the integrity of our analysis by ensuring that only complete records are included. This blog will discuss method to drop rows with nan in Pandas.

How `dropna()` method helps us to drop rows with nan values in Pandas?

The dropna() method in Pandas helps us to to remove rows or columns from a dataframe that contains missing values.It drops any row that contains at least one NaN value in dataframe or series by default. This approach is beneficial for ensuring that statistical calculations and data visualizations are based on complete data.

How to use dropna() method in pandas ?

First we will create a dataframe called df with data as dictionary named ’employee_data’. The dictionary contains keys like Name, Age, City and salary
import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
employee_data= {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, np.nan, 35, 40],
    'City': ['New York', 'San Francisco', 'LA', np.nan],
    'Salary': [5000, 6000, 5500, 6200]
}

df = pd.DataFrame(data)

# Dropping rows with any NaN values
cleaned_df = df.dropna()

print(cleaned_df)
Output
Name    Age       City    Salary
Alice    25.0     New York  5000
Charlie  35.0     LA        5500
David    40.0     NaN       6200
In the above example the row containing Bob was removed because it had a NaN value in the “Age” column.

Parameters of `dropna()` method in Pandas.

When we are using the dropna() function we have to know these parameters.
  • axis : The axis parameter determines whether to drop rows or columns. axis=0 or axis=’index’ : Means drop rows. This is the default value. axis=1 or axis=’columns’: Drop columns.
  • how : The how parameter helps us to specify the condition for dropping. `how=’any’`: Drop if any NaN values are present.This is the default value. `how=’all’`: Drop only if all values in a row or column are NaN.
  • thresh : thresh parameter basically mean minimum threshold required in term of non-NaN values to keep the row/column.
  • subset : Using this parameter we can specify particular columns in which we interest to check for NaN values.
  • inplace : If we set the value of the ‘inplace’ parameter to `True` then we get modified original DataFrame instead of returning a new one.
Let’s understand all the parameters with a example.

Dropping Rows Where All Values are NaN using how parameter in dropna()

To drop only those rows where all values are nan, you can use:
import pandas as pd

# Sample DataFrame
data = {
    'A': [1, None, 3, None],
    'B': [None, None, 4, None],
    'C': [None, None, None, None]
}

df = pd.DataFrame(data)

# Dropping rows where all values are NaN
df_cleaned = df.dropna(how='all')

print(df_cleaned)
Output
 A    B     C
1.0  NaN  None
3.0  4.0  None

Drop rows with NaN Based on specific columns using dropna() in pandas.

We can also specify which columns to check for nan values using the `subset` parameter:
import pandas as pd

# Sample DataFrame
data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, 4, None],
        'C': [1, 2, 3, 4, 5]}

df = pd.DataFrame(data)

# Drop rows where NaN appears in columns 'A' or 'B'
df.dropna(subset=['A', 'B'], inplace=True)

print(df)
Output
 A    B  C
2.0  2.0  2
4.0  4.0  4
The above code will drop rows that have NaN in either the “A” or “B” columns but will keep rows with missing values in other columns like “C”.

Drop row with nan based on threshold parameter in Pandas.

We will keep the value of thresh=2 means that a row must have atleast 2 non- NaN values for retaining it.
import pandas as pd

# Sample DataFrame
data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, None, None],
        'C': [1, None, 3, None, 5]}
df = pd.DataFrame(data)

# Drop rows that have less than 2 non-NaN values
df.dropna(axis=0, thresh=2, inplace=True)

print(df)
Output
   A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  NaN
2  NaN  3.0  3.0
4  5.0  NaN  5.0
The row at index 3 was eliminated.

Drop row with Nan values using axis parameter in dropna()

import pandas as pd

data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, 4, None],
        'C': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

df.dropna(axis=0, inplace=True) 
#axis=0 means drop from rows having nan values
#axis=1 means drop from columns having nan values

print(df)

Output

   A    B    C
1  2.0  2.0  2
3  4.0  4.0  4

Conclusion

We can drop rows with nan values using ‘dropna()’ method in Pandas.The dropna() method is important for cleaning dataset by removing rows or columns with missing values. Removing nan values is important process for preparing data for data analysis.

You can also learn about: