Pandas dataframe append deprecated use pd.concat now.

Understanding pandas dataframe append deprecated.

Recent deprecation of ‘DataFrame.append’ method in Pandas has created a discussion point amongst data scientists and developers.

As we move to Pandas 2.0, this method is no longer recommend to be used which leads developers to use ‘pandas.concat’ for appending data. By using ‘pandas.concat’ we enhance performance in data manipulation tasks.

Here is a glimpse of append code which gives error now.

import pandas as pd

# Creating an empty DataFrame
df = pd.DataFrame(columns=['A', 'B'])

# Appending rows using append (deprecated)
df = df.append({'A': 1, 'B': 2}, ignore_index=True)

Output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in ()
      5 
      6 # Appending rows using append (deprecated)
----> 7 df = df.append({'A': 1, 'B': 2}, ignore_index=True)
      8 
      9 

/usr/local/lib/python3.10/
dist-packages/pandas/core/generic.py in __getattr__(self, name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)
   6300 
   6301     @final

AttributeError: 'DataFrame' object has no attribute 'append'

Why Was append() Deprecated in Pandas Python?

The decision to deprecate append() was driven by several factors like

  • Better optimization : Every time we append() in dataframe, it leads to creation of new dataframe which is inefficient for large datasets in terms of memory usage and computation.
  • Better Solution: We already have ‘pd.concat()’ which is more efficient in combining dataframe.
  • Vectorization : Pandas encourages developers to think on baseline of vectorization and append() method is a row wise operation which are slower and less efficient in terms of memory usage.

To summarise we can say DataFrame.append() lead to performance issues. When appending multiple rows or DataFrames in a loop. The under the hood mechanics require copying data for the index and values, which is not optimal for large dataset.

Also learn about how to implement one hot encoding in Python using pandas.

Moving towards `pandas.concat()`

To resolve these issues the developers are encouraged to utilize ‘pandas.concat’. This method concatenate pandas objects along a particular axis.

Let understand the syntax of ‘pandas.concat()’.

pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, 
keys=None, levels=None, names=None, verify_integrity=False, 
sort=False, copy=None)

We will go through some of pandas.concat parameters

  • Objs: Dataframe or series that you want to concatenate.They are represent as a list of tuple of dataframes.
  • Axis: this paramter determines concatenation axis. If we have axis=0 means row wise concatenation (vertical stacking) and axis=1 means column wise concatenation (horizontal stacking). Default value is 0.
  • Join : Determines how to handle indexes (row/columns) that do not match.
    outer : Includes all indexes/columns that appears in all objects (union of the indexes)
    Inner: Only considers indexes or columns that appear in all objects (intersection of indexes)
    Default value is outer
  • ignore_index : the ignore_index parameter helps us to reset the index of the resulting dataframe.It is usefuk when we don’t want to keep the original index values. Default value is False.
  • Sort: if value of sort parameter is True, it sorts the resulting dataframe. Default value is false.

Code examples of pandas.concat()

Now, Let’s see how to add a row to empty dataframe using pandas.concat() method.
import pandas as pd

# Creating an empty DataFrame
df = pd.DataFrame(columns=['A', 'B'])

# Creating a new row as a DataFrame
new_row = pd.DataFrame({'A': [1], 'B': [2]})

# Concatenating the new row
#ignore_index=True helps to reset the index of output 
df = pd.concat([df, new_row], ignore_index=True)
dataframe

df.head()
Output
 A B
 0 1
Adding two dataframe with some data using pandas.concat() method.
import pandas as pd

# Creating  two DataFrames
data1 = {'Name': ['Alice', 'Bob', 'Charlie'],
         'Age': [25, 30, 35]}

df1 = pd.DataFrame(data1)

data2 = {'Name': ['David', 'Eve'],
         'Age': [40, 45]}

df2 = pd.DataFrame(data2)

# Concatenating both DataFrames (df1, df2) along rows (axis=0)
# ignore_index=True helps to reset the index of output dataframe
result = pd.concat([df1, df2], axis=0, ignore_index=True)


result.head()
Output
Name     Age
Alice    25
Bob      30
Charlie  35
David    40
Eve      45

Adding single row in dataframe using loc[]

If we only want to add one row to a DataFrame at a time. We can use the .loc[] indexer to append a row directly.
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4] })

# New data to append
new_data = {'A': 5, 'B': 6}

# Add the new row using .loc
df.loc[len(df)] = new_data

print(df)
Output
A  B
1  3
2  4
5  6
In this example len(df) gives the next available index in the DataFrame for adding a row and then we assign the new row directly at that index.

Best Practices

  • Batch Data Collection: Try to get all the data records in a list and create a dataframe and concatenate with the other dataframe at once. This approach is more efficient in memory usage and processing time.
  data = []
   for i in range(10):
       data.append({'A': i, 'B': i * 2})
   df = pd.DataFrame(data)

The above code show exactly how we can store the intermediate data and then create a dataframe. Once the dataframe is created we can proceed with concatenating data.

  • Refactor Existing Code: We should eview our existing codebase for instances of `.append()` and replace them with `pd.concat()`.

Conclusion:

The deprecation of ‘DataFrame.append’ method is a part of making data manipulation in pandas better in terms of performance and usability. By using ‘pandas.concat()’ method, we can write cleaner and efficient code. By adapting these changes we are making our code more robust.

You can also read our other blogs on : how to convert dataframe to csv and pandas groupby() function.

2 thoughts on “Pandas dataframe append deprecated use pd.concat now.”

  1. Pingback: Drop rows with NaN : Pandas

  2. Pingback: Solving Fatal Python Error: init_import_size: Failed to Import the Site Module Error

Comments are closed.