Are you struggling with duplicated data in a NumPy and are looking for a solution in Numpy to remove duplicates from an array? If yes, then you have landed in the right place. 🎉
In Python, the Numpy library comes with various valuable methods that provide multiple ways to manipulate data in Numpy arrays. In this article, we’ll look into multiple solutions to remove duplicate records from a Numpy array.
Note: For more information and detailed steps to download, install and import NumPy on your system (Windows, macOS, Linux) refer to our complete guide here. Additionally to update NumPy on your system using PIP, you can refer to our guide for that here.
Table of Contents
How to Remove Duplicates From a Numpy Array?
In some cases, it is essential to remove duplicate data from an array due to a number of reasons. It might be possible that data is duplicated mistakenly and you want to remove it or you might be interested in finding duplicate data. So, don’t worry. This guide will help you a lot. There are several ways to remove duplication from a Numpy array. Let’s discuss two approaches here:
- np.unique()
- set()
Method 1: Remove Duplicates From a Numpy Array Using np.unique() Function
We can use the np.unique() function if we want to remove duplicates from an array. To use that function, we will pass an array as an argument.
The syntax of this function is given below:
Syntax
# remove np array duplicates np.unique(array)
a. Remove Duplicates from 1-D Array
Let’s use the np.unique(array) function to remove duplicates from a 1-D array.
Code
# import numpy import numpy as np # numpy array created array = np.array([1, 2, 4, 2, 3, 3, 5, 3, 1, 4]) # print the array with duplicate numbers print(array) # print the array and remove duplicates print(np.unique(array))
Output
[1 2 4 2 3 3 5 3 1 4] [1 2 3 4 5]
In the above example 👆you can see we have created a Numpy array using np.array() function. In the Numpy array, we’ve duplicated values. To remove duplicate values we have used the np.unique() function and printed the array again.
b. Remove Duplicates Rows from 2-D Numpy Array
Let’s remove duplicates rows from a 2-D array using np.unique() function:
Code
# import numpy import numpy as np # numpy array created array = np.array([[1, 2, 1], [2, 3, 5], [1, 2, 1]]) # print the numpy array with duplicate rows print("The numpy array with a duplicate row is:") print(array) # remove duplicate rows from numpy array print("Remove duplicate rows from numpy array:") print(np.unique(array, axis=0))
Output
The numpy array with a duplicate row is: [[1 2 1] [2 3 5] [1 2 1]] Remove duplicate rows from numpy array: [[1 2 1] [2 3 5]]
In the above example 👆 you can see that we have two duplicate rows. We have removed duplicate rows by using the np.unique() function. The second printed output only contains the unique rows.
c. Remove Duplicates Columns from 2-D Numpy Array
Let’s remove duplicates columns from a 2-D array using np.unique() function:
Code
# import numpy import numpy as np # numpy array is created array = np.array([[1, 2, 2], [4, 1, 1], [3, 1, 1]]) # print the numpy array with duplicate columns print("The numpy array with duplicate column is:") print(array) # remove duplicate columns from numpy array print("Remove duplicate columns from numpy array:") print(np.unique(array, axis=1))
Output
The Numpy array with the duplicate column is: [[1 2 2] [4 1 1] [3 1 1]] Remove duplicate columns from Numpy array: [[1 2] [4 1] [3 1]]
In the above example 👆 you can see that we have duplicate columns. We have removed duplicate columns by using the np.unique() function. The second printed output only contains the unique column values.
Method 2: Remove Duplicates from a Numpy Array Using set() Function
The set() function in python is a built-in function that takes an input of iterable elements and returns distinct elements from that set of iterable elements.
Let’s use set() function in an example:
Code
# import numpy import numpy as np # numpy array has been created array = np.array([[1,2,3], [3,2,1], [4,5,6], [7,8,9], [9,8,9], [4,5,6], [7,8,9] ]) # Delete duplicate rows from 2D NumPy Array array = np.vstack(list(set(tuple(row) for row in array))) print("Distinct array values are:") print(array)
Output
Distinct array values are: [[9 8 9] [3 2 1] [7 8 9] [1 2 3] [4 5 6]]
In the above example👆we have created a Numpy array of duplicate values. We have iterated each row of the 2-D Numpy array and made its contents as tuples because it is not comparable. After this, pass that array to a set() method. Using this function will return unique elements. Here we have used the numpy.vstack() function for joining the array vertically.
Conclusion
In this article, we have discussed how to remove duplicates from a Numpy array. Further, we have provided two alternative solutions to remove duplicates from a Numpy array.
A quick recap of the topics we’ve explained in this article
- What is a Numpy library in python?
- How to remove Duplicates From Numpy Array?
- How to remove duplicates using np.unique() method?
- How to remove duplicates from using set() method?
Hope this guide helps you out 😇Do share your experience using each method and let us know in the comment sections below 👇which solution you find more feasible 🥰