Table of Contents
ToggleWhat does unhashable type mean in Python?
Unhashable type means that object does not have a fixed hash value. Object like that cannot be used as a key in a dictionary or an element in a set.
Data structures like dictionaries and sets are hash based data structures. Hash based data structures means that the structure relies on hash functions to efficiently store and retrieve elements.
A hashable type in python is an object that has a fixed value throughout its lifetime. The hashable objects can be used as keys in dictionaries and elements in sets. Some of the examples include integers, strings and tuples.
Why Are numpy.ndarray Objects Unhashable?
In simple terms numpy.ndarray objects are mutable, which means its content can change after creation. This mutability prevents them from being hashable and making them unsuitable for using them as dictionary keys or set elements.
Examples of the 'Unhashable Type numpy.ndarray' Error in Pandas
Example 1: Trying to find unique values in dataframe column
In the given code we are trying to find unique values in a DataFrame column using the unique() function of the pandas library. We will encounter an error: Let me explain the code.
Python code
import pandas as pd import numpy as np #creating data for dataframe data = { 'Numbers': [1, 2, 3, 4, 5], 'Strings': ['apple', 'banana', 'cherry', 'date', 'elderberry'], 'Floats': [1.1, 2.2, 3.3, 4.4, 5.5], 'Booleans': [True, False, True, False, True], 'Arrays': [np.array([1, 2]), 'hello', 'test', 1, np.array([9, 10])] } #creating dataframe df = pd.DataFrame(data) df.head()
output
Numbers Strings Floats Booleans Arrays 0 1 apple 1.1 True [1, 2] 1 2 banana 2.2 False hello 2 3 cherry 3.3 True test 3 4 date 4.4 False 1 4 5 elderberry 5.5 True [9, 10]
Trying to get unique values for ‘Arrays’ column from dataframe (df) which gives error
df['Arrays'].unique()
We get a TypeError: unhashable type: ‘numpy.ndarray’ because Python requires a hashable object for unique() operation in Pandas. But NumPy arrays are mutable means they are unhashable.
Output
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 df['Arrays'].unique() Python\Python310\site-packages\pandas\core\algorithms.py:428, in unique_with_mask(values, mask) 426 table = hashtable(len(values)) 427 if mask is None: --> 428 uniques = table.unique(values) 429 uniques = _reconstruct_data(uniques, original.dtype, original) 430 return uniques File pandas\_libs\hashtable_class_helper.pxi:7247, in pandas._libs.hashtable.PyObjectHashTable.unique() File pandas\_libs\hashtable_class_helper.pxi:7194, in pandas._libs.hashtable.PyObjectHashTable._unique() TypeError: unhashable type: 'numpy.ndarray'
Solution : To solve the error we will convert the ‘Arrays’ column values to string and apply the unique() function to get unique values for that column.
The error is solved as string is non mutable which makes it hashable
df['Arrays'].apply(lambda x: str(x)).unique()
Output
array(['[1 2]', 'hello', 'test', '1', '[ 9 10]'], dtype=object)
Example 2: Trying to create a dictionary with a NumPy array as a key.
In the below code we are using numpy array as dictionary key and assigning a value ‘This line will give error. This process will throw an error. As dictionary keys need to be hasable and numpy arrays are not hashable.
Python Code
import numpy as np my_dict = {} key = np.array(['line_1', 'line_2', 'inline_3']) my_dict[key]='This line will give error'
output
-------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[23], line 4 2 my_dict = {} 3 key = np.array([1, 2, 3]) ----> 4 my_dict[key]='hi' TypeError: unhashable type: 'numpy.ndarray'
To solve the error we just need to convert numpy arrays to tuple. Tuple are immutable and hashable
Python Code
import numpy as np my_dict = {} # Convert the NumPy array to tuple key = tuple(np.array(['line_1', 'line_2', 'inline_3'])) # Using the tuple as the key in the dictionary my_dict[key] = 'This line will not give error' print(my_dict)
output
{('line_1', 'line_2', 'inline_3'): 'This line will not give error'}
Example 3:Trying to add a NumPy array directly to a set.
my_set = set() my_array = np.array(['man', 'women','animal']) my_set.add(my_array)output:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[28], line 3 1 my_set = set() 2 my_array = np.array([1, 2]) ----> 3 my_set.add(my_array) # Raises TypeError: unhashable type: 'numpy.ndarray' TypeError: unhashable type: 'numpy.ndarray'Solution: Sets in Python require their elements to be hashable. Hashable means they should be immutable whereas NumPy arrays are mutable. We have to convert numpy arrays to immutable objects. We can use frozenset().
import numpy as np my_set = set() my_array = np.array(['man', 'woman', 'animal']) # Convert the array to a frozenset my_set.add(frozenset(my_array)) print(my_set)Output
{frozenset({'animal', 'woman', 'man'})}
How to Fix the 'Unhashable Type numpy.ndarray' Error
Converting numpy.ndarray to Hashable Types (Tuples, Set and String)
Python code
data['B'] = data['B'].apply(tuple)
Using .tolist() or .apply() for Numpy Arrays in Columns for pandas dataframe and series.
Python code
import pandas as pd import numpy as np data = pd.DataFrame({ 'A': [1, 2, 3], 'B': [np.array([1, 2]), np.array([3, 4]), np.array([5, 6])] })Below code shows a Method how to use .tolist() on each value in column
data['B'] = data['B'].apply(lambda x: x.tolist()) print(data)Output
A B 0 1 [1, 2] 1 2 [3, 4] 2 3 [5, 6]Making a new DataFrame for the second method
data = pd.DataFrame({ 'A': [1, 2, 3], 'B': [np.array([1, 2]), np.array([3, 4]), np.array([5, 6])] })The below code convert numpy array to list using apply function
data['B'] = data['B'].apply(list) print(data)Output
A B 0 1 [1, 2] 1 2 [3, 4] 2 3 [5, 6]
Causes of the 'Unhashable Type numpy.ndarray' Error
- Using the numpy.ndarray as keys for indexing or in methods
- Using numpy.ndarray as an input value in a function which accepts only hashable types.
- When we are performing operations like union, unique or intersection on data frames that include numpy.ndarray will cause an ‘unhashable type’ error.
Best Practices to Avoid the 'Unhashable Type' Error
- When we need to perform operations which require hashable types we should always use lists or tuples instead of numpy arrays.
- Use a list or tuple which pandas function like set_index(), groupby(), loc[] etc.
- Always consider data structures lists or dictionaries that are inherently hashable.
Conclusion
when you get typeerror: unhashable type: ‘numpy.ndarray’ just Convert numpy arrays to hashable types like tuple, list, string, set etc as per need and most of the times it will solve the error.
Remember to use hashable types in any index operations.
Pingback: Solving error failed building wheel for Numpy in python -