These are 3 easy ways to remove duplicates in your data to create a unique / distinct list of values in Microsoft Excel. It’s a very common task for data cleaning and also a very common job interview question. One of the solutions (removing duplicates with power query) returns a different result to the other methods. I’ll explain what the difference is and what to watch out for.
Timestamps 00:00 How to Remove Duplicate Values For Multiple Columns 01:15 How to Use the Remove Duplicates Feature 03:09 How to Use the UNIQUE Function 04:50 How to Use Power Query to Remove Duplicates 08:25 Wrap Up
► LINK to more Power Query tutorials: • Excel Power Query (Get \u0026 Transform)
Note: This description contains affiliate links, which means at no additional cost to you, we will receive a small commission if you make a purchase using the links. This helps support the channel and allows us to continue to make videos like this. Thank you for your support!
#excel
Content
0.08 -> Today we're going to take a look at a very common
task when it comes to cleaning data and it's also
5.2 -> a very common interview question that you might
get if you're applying for a data or financial
11.2 -> analyst type of job. How can you remove duplicates
in your data? I'm going to show you three methods,
17.36 -> it's important that you understand the advantages
and disadvantages of the different methods and
23.28 -> why one of these methods might return a different
result to the other ones. Let's take a look
34.4 -> Okay, so I have this table with
sales agent region and sales value
38.56 -> I want to remove the duplicates that occur in this
table but first of all what are the duplicates?
44.96 -> well if we take a look at this row for example
and take a look at this one, is this a duplicate?
51.92 -> no right? because the sales value is different,
but what about this one and this one? These are
58.72 -> duplicates. What I want to happen is that every
other occurrence of this line is removed. I just
65.6 -> keep it once in the end result. This means that I
need to take a look at the combination of all the
71.92 -> columns in order to be able to identify duplicates.
Now the first example that I'm going to show you
77.6 -> is going to work in all excel versions but first
you need to make a decision, do you want to keep
83.76 -> the original data set or not? If you want to keep
it so you can compare before and after, you should
90.56 -> create a copy of it. So I'm going to press CtrlA,
let's press it again to also include the headers
96.16 -> and press CtrlC to copy this. Now let's go to
the side here, I will paste these values and also
103.12 -> paste the formatting as well. Okay so now, I
can just click somewhere inside this data set.
109.68 -> Go to the Data Tab under Data Tools, here
there is a feature called Remove Duplicates,
116 -> just click on it. It's going to try and pick
out your data range, in my case it's Correct
120.8 -> there is an automatic check mark here for
me to specify that the data has headers.
125.6 -> If yours doesn't pick up the headers properly
you can just place a check mark here yourself. The
130.32 -> columns involved are fine, so I'm going to click
on ok five duplicate values found and removed,
137.44 -> 14 unique values remain. Now let's take a look,
I just have one occurrence of this combination.
145.84 -> Okay so, in case you just want to remove the
duplicates of one column, so let's say you want
151.92 -> a distinct list of sales agents, you can repeat
this process just on the Sales Agent Column. So,
159.28 -> I will just show you how that's done. I'm going to
copy this, go to the side here let's just paste
164.48 -> this in, you also don't really need the header
in this case. You can select your Range or just
169.76 -> click somewhere inside the range and go to remove
duplicates. I'm going to place a Check Mark here,
175.84 -> my data has headers and click on ok and this
time 15 duplicate values were found and removed,
182.88 -> and I end up with four sales agents. Okay so, that's
how you can apply it to a specific column. That was
190.08 -> method one, now let's take a look at method two.
This method uses formulas and it's something you
196.48 -> can use if you have Office365, this of course
works on the web version of Excel, so if you
202.8 -> have a free Office account, you can use these
formulas on Excel on the web. Start with Equals,
209.36 -> the function we need is the unique function, so
this is one of the newer functions in Office365,
215.68 -> all we need is the Array and the array is our
table. So, just click to the side of the table
221.28 -> until you see that tilted arrow, then click
it's going to pick up your entire table.
226.16 -> Mine is called TSales, that's all I need. Close
Bracket, press Enter, and I have my unique list.
233.84 -> I have removed duplicates from this data set, now I'm
just going to copy and paste the Headers here and
240.48 -> let's just apply the Number Formatting to these
as well. Now, the advantage of using a function
248 -> instead of the first method that I showed you, is
that the second method here is dynamic, this means
254.64 -> if I add data to the bottom of my table it's
automatically going to be reflected with this
260.48 -> version. Let's test this, I'm going to copy
this and paste it here. Well right now we're
267.28 -> not going to see anything because duplicates
are automatically removed, but if I change this
272.16 -> number to let's say 40, it automatically shows
up here whereas with this method, I would have to
278.24 -> repeat the steps to get my unique list. Okay so, now
let's take a look at the third way of doing this.
285.44 -> I'm just going to press CtrlZ a few times to go
back to our data set. The third way is with Power
291.84 -> Query, now this method is semi-dynamic because all
you have to do is press refresh. Let's take a look,
299.28 -> I'm going to go to the Data Tab here, this time
instead of going to the data tools section,
304.32 -> we're going to go to the Get and Transform Data.
First of all, power query needs to get the data and
310 -> then we can transform it. Our data is right here, so
we're going to go with from sheet, this is going to
316.32 -> bring up the Power Query Editor. We have our Source
Step here, which was just our table and power query
322.8 -> automatically added a change type step for the
headers. It just transformed them to text for this
329.2 -> and to a whole number for our sales values, which
is fine. Now to remove duplicates, I'm going to hold
335.2 -> down Shift, so the first column is selected I'm
holding down Shift and I'm going to click on the
339.84 -> Last Column, then right mouse click and remove
duplicates. Okay, so everything looks good,
346.64 -> let's call this TSales Report and send this
back to the worksheet. So close and load two, I
353.12 -> want to put it on the existing worksheet. Let's
put it right here, so we can compare the results
358.56 -> and click on ok. It says 15 rows were
loaded, now something is different.
366.32 -> Can you spot the additional row? It's right here,
in this case bridge was written with a small B
373.28 -> instead of a capital b. So, this line was not seen
as a duplicate even though it exists up here. This
382.32 -> is something you have to keep in mind when you use
power query and you're removing duplicates there,
388.48 -> because the formula version sees these as the
same thing, it's not case sensitive. The remove
395.44 -> duplicates feature from the data tab is also not
case sensitive, but power query is case sensitive.
402.64 -> So, if we want to remove the duplicate right here,
we need to add an additional step in Power Query
409.2 -> to get this to work correctly. So, let's go back
to our query, so bring up Queries and Connections
415.68 -> I'm just going to double click to bring up the
Power Query Editor and jump back a step to change
421.04 -> type. So, right before removing duplicates, I need
to make sure that everything has the same case.
428 -> In this example the problem applies to Sales Agent
not to Region, but if you have multiple columns
434.48 -> that could have a different letter case, you need
to apply this step that I'm going to show you
440 -> to the other columns as well and that step
is to transform the column, to make sure the
445.92 -> content of the column is in the same case. So, I
clicked on the transform tab, I have this already
452.08 -> selected. I'm going to go to format here and
select a format that's common to the whole column.
459.44 -> We can put everything in Uppercase Lowercase, in
this case it makes sense to capitalize each word.
465.44 -> So, I'm going to go with that and asks me if I want to
insert a step in between these two, yes I do. Okay
471.52 -> so now our bridge here is capitalized, so when we
go to remove duplicates it's automatically removed.
478.72 -> So, let's go and send back the result to Excel,
just click on Close and Load and now we get
483.92 -> the same results. Okay now, we can also test how
this updates when we add something new to this.
490.48 -> So, I'm going to copy and paste this here, let's change
this to 50. Formula version automatically updates
498 -> this version doesn't automatically
update, I have to Right-Mouse Click
502 -> select Refresh and I get the new data shown
up here. As you can see, there are different
507.36 -> ways you can remove duplicates. Use the one that
suits your Excel version and also your process.
513.44 -> Also, remember that power query is case sensitive.
Okay so, that's it for today, give this video a like.
520.8 -> Do subscribe if you aren't subscribed already
and I'm going to see you in the next video.