Module 2 · Lesson 2.8

Remove Duplicate Records

Edit on GitHub

SQL's SELECT DISTINCT returns only unique rows from a query. Excel offers two approaches to the same problem: a destructive in-place tool (Remove Duplicates) that permanently deletes duplicate rows, and a non-destructive formula approach (UNIQUE()) that returns a deduplicated list without touching the original data. Knowing when to use each is what the MO-211 exam tests.


Part 1: The Remove Duplicates Tool

Data tab > Data Tools group > Remove Duplicates

This tool permanently deletes all but the first occurrence of each duplicate from your dataset.

How It Works

  1. Click anywhere in your data range.
  2. Open Remove Duplicates.
  3. A dialog shows all columns with checkboxes. - Checked columns are used to define "uniqueness." A row is a duplicate only if ALL checked columns match another row. - Unchecked columns are ignored for comparison purposes.
  4. Click OK. Excel reports how many duplicates were removed and how many unique values remain.

Column Selection — The Key Concept

Goal Action
Remove rows where every column matches Check all columns
Remove rows where only a specific column has duplicates Check only that column
Deduplicate by Customer ID (ignore order date) Check only the Customer ID column
IMPORTANT
**Remove Duplicates is destructive and immediate.** It cannot be undone beyond `Ctrl+Z` in the same session. Always work on a **copy of your data**, or use `UNIQUE()` if you need to preserve the original.

Part 2: UNIQUE() — The Non-Destructive Alternative

=UNIQUE(array, [by_col], [exactly_once])

UNIQUE() returns a deduplicated version of a range as a dynamic spill array — the original data is untouched.

Parameter Description Default
array The range or array to deduplicate Required
by_col FALSE = compare rows; TRUE = compare columns FALSE
exactly_once TRUE = return only values that appear exactly once (singletons); FALSE = return one of each FALSE

Examples

=UNIQUE(A2:A100)

Returns a list of unique values from column A, spilling downward.

=UNIQUE(A2:C100)

Returns unique rows across three columns — a row must match in ALL three columns to be considered a duplicate.

=UNIQUE(A2:A100, FALSE, TRUE)

Returns only values that appear exactly once — useful for finding items with no duplicates.

TIP
Combine `UNIQUE()` with `SORT()` to return a clean, alphabetically sorted unique list: `=SORT(UNIQUE(A2:A100))`

Part 3: Identifying Duplicates Before Removing Them

Before deleting anything, use Conditional Formatting to visually flag duplicates:

  1. Select the column to check (e.g., A2:A100).
  2. Home tab > Conditional Formatting > Highlight Cells Rules > Duplicate Values...
  3. Choose a fill color and click OK.

All duplicate values are now highlighted. Review them before deciding to remove.

You can also use COUNTIF() to count occurrences and flag duplicates with a helper column:

=COUNTIF($A$2:$A$100, A2)

Any cell where this returns greater than 1 is a duplicate.


Part 4: Choosing the Right Approach

Situation Best Approach
One-time cleanup, original data no longer needed Remove Duplicates
Need to keep original and view unique separately UNIQUE()
Need a unique list that updates as source changes UNIQUE() (dynamic)
Need to identify duplicates before deciding Conditional Formatting + COUNTIF()
Need to find records that appear only once UNIQUE(..., FALSE, TRUE)

Official Resources