Pandas iloc VS loc

Pandas iloc vs loc

When it comes to data wrangling in pandas, the iloc and loc method comes to our mind. The iloc and loc methods  in pandas are used for selection and indexing of rows and columns. However, they differ in terms of how they handle indexing.   

What is .iloc in python?

Iloc (index location) is integer based indexing in Pandas. Selection of data happens by using the integer position of row and column in the dataframe.

Syntax of .iloc

df.iloc[row_index, column_index]
  • row_index is an Integer or array of integers representing the row indices.
  • column_index is an Integer or array of integers  representing the column indices

What does .loc do in python?

Loc (Label Location) is label-based indexing in Pandas. Labels (row or column names) are used to locate elements in the data frame.

Syntax of .loc

df.loc[row_label, column_label]
  • row_label is a label or a list of labels representing the row indices.
  • column_label is a  label or a list of labels representing the column indices.
Now, let’s compare iloc and loc method of pandas.

iloc VS loc

Feature

loc

iloc

Indexing

Label-based (names)

Integer-based (positions)

Flexibility

More flexible with labels

Less flexible, but precise with positions

Slicing

Inclusive (includes endpoint)

Exclusive (excludes endpoint)

Error handling

Key Error for invalid labels

Index Error for invalid positions

1) Loc works based on labels i.e. the name of columns and rows for indexing. Whereas iloc performs indexing using the integer position of row and column in the data frame.

2) Loc lets you choose data based on a wider range of criteria. You can use names, labels, boolean conditions on labels, or even slices of labels to select specific rows and columns. Lets explain with code example

Selecting a column by name:

df.loc[:, 'column_name']  

Selecting a row by index label:

df.loc['label'] 

Filtering rows based on a condition on their index labels:

# Selects rows with index labels after a certain date
df.loc[df.index > '2023-01-01']

Selecting a range of rows by index label order:

# Selects rows between two index labels (inclusive)
df.loc['2023-04-01':'2023-06-30']

Filtering rows based on column values:

# Selects rows where a column value meets a condition
df.loc[df['column_b'] > 50]  

iloc is more limited in how you select data: You can only use integer positions of rows and columns

3) When you slice with loc, the starting and ending labels you specify are both included in the result. However when you slice with iloc, the ending position you specify is not included in the result

4) Loc gives keyerror if label is not present whereas iloc gives indexerror if index is not in the range.

iloc vs loc at syntax level.

Creating a data frame with column name, age, score and populating it.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 28, 35],
'Score': [85, 92, 70, 90]}
df = pd.DataFrame(data)
df.index=['n1','n2','n3','n4']

Select row “n1” in dataframe df using iloc vs loc

|-------------------|------------------------|-------------------|
|                   | loc                    | iloc              |
|-------------------|------------------------|-------------------|
| Select a row      | df.loc['n1']           | df.iloc[0]        | 
|-------------------|------------------------|-------------------|

Select column “name” in dataframe df using iloc vs loc

|-------------------|------------------------|-------------------|
|                   | loc                    | iloc              |
|-------------------|------------------------|-------------------|
| Select a column   | df.loc[:,'Name']       | df.iloc[:,0]      | 
|-------------------|------------------------|-------------------|

Select multiple rows ‘n1’ and ‘n2’ in dataframe df using iloc vs loc

|---------------------|------------------------|-------------------|
|                     | loc                    | iloc              |
|---------------------|------------------------|-------------------|
| Select multiple rows| df.loc[['n1','n2']]    | df.iloc[[0,1]]    | 
|---------------------|------------------------|-------------------|

Select multiple columns ‘Name’ and ‘Score’ in dataframe df using iloc vs loc

|---------------------|---------------------------|-------------------|
|                     | loc                       | iloc              |
|---------------------|---------------------------|-------------------|
| Select multiple col | df.loc[:,['Name','Score']]| df.iloc[:,[0,2]]  | 
|---------------------|---------------------------|-------------------|

Select row range from ‘n1’ to ’n3’ in dataframe df using iloc vs loc

|---------------------|------------------------|-------------------|
|                     | loc                    | iloc              |
|---------------------|------------------------|-------------------|
| Select row range    | df.loc['n1':'n3']      | df.iloc[0:3]      | 
|---------------------|------------------------|-------------------|

Select column range from ‘Name’ to ’Score’ in dataframe df using iloc vs loc

|---------------------|------------------------|-------------------|
|                     | loc                    | iloc              |
|---------------------|------------------------|-------------------|
| Select column range | df[:,'Name':'Score']   | df.iloc[:,0:3]    | 
|---------------------|------------------------|-------------------|

Update name Alice to John in dataframe df using iloc vs loc

|----------------------------|---------------------------|-----------------------|
|                            | loc                       | iloc                  |
|----------------------------|---------------------------|-----------------------|
| Update value in dataframe  | df.loc['n1','Name']='John'| df.iloc[0,0]='John'   | 
|----------------------------|---------------------------|-----------------------|

Conclusion

If your data has logical labels and we want to access elements based on those labels, .loc[] is probably a better choice.

If we have data where position information is more relevant or we need to access elements by their location rather than labels, then .iloc[] is the way to go. However, it is important to understand the importance of each method and choose the one that best suits your specific use case.

You can get the above code on GitHub . 

1 thought on “Pandas iloc VS loc”

  1. Pingback: How to add a row to a dataframe in python

Comments are closed.