It happens that many times we got stuck in an Excel workbook which contains duplicates cells. It is difficult to understand the data with duplicates. Well, we can delete duplicates in Excel. Let’s see how?
In this section, we will first learn to identify the duplicates in Excel and then we will see how to delete the duplicates? We will also discuss the scenario when there are multiple rows carry the duplicate data in several columns.
The topics to be discussed:
- Identify Duplicates
- Formula to find Duplicates
- Delete/Remove Duplicates
- Remove duplicates rows based on one or more columns
Identify Duplicates
To identify the duplicates, we have an inbuilt function in Excel and for that first select the range of the cells where you want to identify or find the duplicate values. Now, go to Styles group of Home tab and click on Conditional Formatting>Highlight Cells Rules>Duplicate Values.
A Duplicate Values dialog box will pop out. Here, you have two drop-down boxes, in first you have to select Duplicate among the option Unique and Duplicate. In the second drop-down box select the colour format in which you have to highlight the duplicates and click OK.
See in the picture below, the duplicates are highlighted in red colour.
Formula to Find or Identify Duplicates
As such, there is not a unique or single formula to identify the duplicates in Excel. We can construct several formulas to find the duplicates using the combination of different functions available in Excel.
Now, we will discuss the construction of one such formula which will let us identify the duplicates in Excel.
=COUNTIF($A$2:$A2, $A2)>1
Here, the COUNTIF function will return the array of TRUE and FALSE. At the first occurrence of the cell content, COUNTIF counts FALSE and on further occurrences of the same content, it counts TRUE.
Now, to put it in a proper format let us modify the formula in a way:
=IF(COUNTIF($A$2:$A2, $A2)>1,“Duplicate”,“-”)
Here, the IF function analyse the array returned by the COUNTIF function, for TRUE it returns Duplicate and for FALSE it returns “-”.
Delete/Remove Duplicates in Excel
To remove the duplicates from the range of cells first select the range having the duplicates. Under Data tab in the Data Tools group click on Remove Duplicates.
Remove Duplicates dialog box will pop up which will show the columns which should be considered while deleting the duplicates.
As in our example, there is only one column it should be selected as the duplicate rows will be removed based on this column. The result is:
Note: Before Removing duplicates, you must save the original data somewhere in the excel so that there is no data loss while doing this.
Remove Duplicate Rows Based on One or More Columns
Suppose, there are multiple columns in the workbook, where the several rows have duplicated data. For example, observe the data below:
Now, see the duplicates in the red square carry the same data for Sales. Like Joseph has value 100 in Sales column 2 times.
But, the duplicates in the blue square have different values for the Sales column. Now, let us see how we can remove the duplicate rows based on several columns.
Go to the Data tab and in Data Tools group click on Remove Duplicates.
It will pop out Remove Duplicates dialog box. Here, we will remove duplicate rows based on both the column i.e. Staff and Sales.
So we will select both the columns Staff and Sales. The result would be:
Note: This would not remove the duplicated row’s first occurrence, it still retains in the result. But it removes all of its further occurrences.
So this is all about identifying, deleting duplicates in the range of cells. We have also achieved this using the formula. Further, you can explore more to find other methods for doing the same.
Leave a Reply