Table of Contents
ToggleWhen it comes to data wrangling in pandas, the iloc and loc method comes to our mind. The iloc and loc methods in pandas are used for selection and indexing of rows and columns. However, they differ in terms of how they handle indexing.
What is .iloc in python?
Syntax of .iloc
df.iloc[row_index, column_index]
- row_index is an Integer or array of integers representing the row indices.
- column_index is an Integer or array of integers representing the column indices
What does .loc do in python?
Syntax of .loc
df.loc[row_label, column_label]
- row_label is a label or a list of labels representing the row indices.
- column_label is a label or a list of labels representing the column indices.
iloc VS loc
Feature | loc | iloc |
Indexing | Label-based (names) | Integer-based (positions) |
Flexibility | More flexible with labels | Less flexible, but precise with positions |
Slicing | Inclusive (includes endpoint) | Exclusive (excludes endpoint) |
Error handling | Key Error for invalid labels | Index Error for invalid positions |
1) Loc works based on labels i.e. the name of columns and rows for indexing. Whereas iloc performs indexing using the integer position of row and column in the data frame.
2) Loc lets you choose data based on a wider range of criteria. You can use names, labels, boolean conditions on labels, or even slices of labels to select specific rows and columns. Lets explain with code example
Selecting a column by name:
df.loc[:, 'column_name']
Selecting a row by index label:
df.loc['label']
Filtering rows based on a condition on their index labels:
# Selects rows with index labels after a certain date df.loc[df.index > '2023-01-01']
Selecting a range of rows by index label order:
# Selects rows between two index labels (inclusive) df.loc['2023-04-01':'2023-06-30']
Filtering rows based on column values:
# Selects rows where a column value meets a condition df.loc[df['column_b'] > 50]
iloc is more limited in how you select data: You can only use integer positions of rows and columns
3) When you slice with loc, the starting and ending labels you specify are both included in the result. However when you slice with iloc, the ending position you specify is not included in the result
4) Loc gives keyerror if label is not present whereas iloc gives indexerror if index is not in the range.
iloc vs loc at syntax level.
Creating a data frame with column name, age, score and populating it.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 28, 35],
'Score': [85, 92, 70, 90]}
df = pd.DataFrame(data)
df.index=['n1','n2','n3','n4']
Select row “n1” in dataframe df using iloc vs loc
|-------------------|------------------------|-------------------| | | loc | iloc | |-------------------|------------------------|-------------------| | Select a row | df.loc['n1'] | df.iloc[0] | |-------------------|------------------------|-------------------|
Select column “name” in dataframe df using iloc vs loc
|-------------------|------------------------|-------------------| | | loc | iloc | |-------------------|------------------------|-------------------| | Select a column | df.loc[:,'Name'] | df.iloc[:,0] | |-------------------|------------------------|-------------------|
Select multiple rows ‘n1’ and ‘n2’ in dataframe df using iloc vs loc
|---------------------|------------------------|-------------------| | | loc | iloc | |---------------------|------------------------|-------------------| | Select multiple rows| df.loc[['n1','n2']] | df.iloc[[0,1]] | |---------------------|------------------------|-------------------|
Select multiple columns ‘Name’ and ‘Score’ in dataframe df using iloc vs loc
|---------------------|---------------------------|-------------------| | | loc | iloc | |---------------------|---------------------------|-------------------| | Select multiple col | df.loc[:,['Name','Score']]| df.iloc[:,[0,2]] | |---------------------|---------------------------|-------------------|
Select row range from ‘n1’ to ’n3’ in dataframe df using iloc vs loc
|---------------------|------------------------|-------------------| | | loc | iloc | |---------------------|------------------------|-------------------| | Select row range | df.loc['n1':'n3'] | df.iloc[0:3] | |---------------------|------------------------|-------------------|
Select column range from ‘Name’ to ’Score’ in dataframe df using iloc vs loc
|---------------------|------------------------|-------------------| | | loc | iloc | |---------------------|------------------------|-------------------| | Select column range | df[:,'Name':'Score'] | df.iloc[:,0:3] | |---------------------|------------------------|-------------------|
Update name Alice to John in dataframe df using iloc vs loc
|----------------------------|---------------------------|-----------------------| | | loc | iloc | |----------------------------|---------------------------|-----------------------| | Update value in dataframe | df.loc['n1','Name']='John'| df.iloc[0,0]='John' | |----------------------------|---------------------------|-----------------------|
Conclusion
If your data has logical labels and we want to access elements based on those labels, .loc[] is probably a better choice.
If we have data where position information is more relevant or we need to access elements by their location rather than labels, then .iloc[] is the way to go. However, it is important to understand the importance of each method and choose the one that best suits your specific use case.
You can get the above code on GitHub .
Pingback: How to add a row to a dataframe in python