3 EASY Ways to Find and Remove Duplicates in Excel

3 EASY Ways to Find and Remove Duplicates in Excel


3 EASY Ways to Find and Remove Duplicates in Excel

Join 400,000+ professionals in our courses: https://www.xelplus.com/courses/

These are 3 easy ways to remove duplicates in your data to create a unique / distinct list of values in Microsoft Excel. It’s a very common task for data cleaning and also a very common job interview question. One of the solutions (removing duplicates with power query) returns a different result to the other methods. I’ll explain what the difference is and what to watch out for.

★★★ My Online Excel Courses ► https://www.xelplus.com/courses/

Download the workbook: https://www.xelplus.com/excel-remove-

Timestamps
00:00 How to Remove Duplicate Values For Multiple Columns
01:15 How to Use the Remove Duplicates Feature
03:09 How to Use the UNIQUE Function
04:50 How to Use Power Query to Remove Duplicates
08:25 Wrap Up

► LINK to more Power Query tutorials:    • Excel Power Query (Get \u0026 Transform)  

★ Get the Official XelPlus MERCH: https://xelplus.creator-spring.com/

✉ Not sure which of my Excel courses fits best for you? Take the quiz: https://www.xelplus.com/course-quiz/

RESOURCES I Recommend: https://www.xelplus.com/resources/

More resources on my Amazon page: https://www.amazon.com/shop/leilagharani

Let’s connect on social:
Instagram: https://www.instagram.com/lgharani
Twitter: https://twitter.com/leilagharani
LinkedIn: https://at.linkedin.com/in/leilagharani

Note: This description contains affiliate links, which means at no additional cost to you, we will receive a small commission if you make a purchase using the links. This helps support the channel and allows us to continue to make videos like this. Thank you for your support!

#excel


Content

0.08 -> Today we're going to take a look at a very common  task when it comes to cleaning data and it's also  
5.2 -> a very common interview question that you might  get if you're applying for a data or financial  
11.2 -> analyst type of job. How can you remove duplicates  in your data? I'm going to show you three methods,  
17.36 -> it's important that you understand the advantages  and disadvantages of the different methods and  
23.28 -> why one of these methods might return a different  result to the other ones. Let's take a look
34.4 -> Okay, so I have this table with  sales agent region and sales value  
38.56 -> I want to remove the duplicates that occur in this  table but first of all what are the duplicates?  
44.96 -> well if we take a look at this row for example  and take a look at this one, is this a duplicate?  
51.92 -> no right? because the sales value is different,  but what about this one and this one? These are  
58.72 -> duplicates. What I want to happen is that every  other occurrence of this line is removed. I just  
65.6 -> keep it once in the end result. This means that I  need to take a look at the combination of all the  
71.92 -> columns in order to be able to identify duplicates.  Now the first example that I'm going to show you  
77.6 -> is going to work in all excel versions but first  you need to make a decision, do you want to keep  
83.76 -> the original data set or not? If you want to keep  it so you can compare before and after, you should  
90.56 -> create a copy of it. So I'm going to press CtrlA,  let's press it again to also include the headers  
96.16 -> and press CtrlC to copy this. Now let's go to  the side here, I will paste these values and also  
103.12 -> paste the formatting as well. Okay so now, I  can just click somewhere inside this data set.  
109.68 -> Go to the Data Tab under Data Tools, here  there is a feature called Remove Duplicates,  
116 -> just click on it. It's going to try and pick  out your data range, in my case it's Correct  
120.8 -> there is an automatic check mark here for  me to specify that the data has headers.  
125.6 -> If yours doesn't pick up the headers properly  you can just place a check mark here yourself. The  
130.32 -> columns involved are fine, so I'm going to click  on ok five duplicate values found and removed,  
137.44 -> 14 unique values remain. Now let's take a look,  I just have one occurrence of this combination.  
145.84 -> Okay so, in case you just want to remove the  duplicates of one column, so let's say you want  
151.92 -> a distinct list of sales agents, you can repeat  this process just on the Sales Agent Column. So,  
159.28 -> I will just show you how that's done. I'm going to  copy this, go to the side here let's just paste  
164.48 -> this in, you also don't really need the header  in this case. You can select your Range or just  
169.76 -> click somewhere inside the range and go to remove  duplicates. I'm going to place a Check Mark here,  
175.84 -> my data has headers and click on ok and this  time 15 duplicate values were found and removed,  
182.88 -> and I end up with four sales agents. Okay so, that's  how you can apply it to a specific column. That was  
190.08 -> method one, now let's take a look at method two.  This method uses formulas and it's something you  
196.48 -> can use if you have Office365, this of course  works on the web version of Excel, so if you  
202.8 -> have a free Office account, you can use these  formulas on Excel on the web. Start with Equals,  
209.36 -> the function we need is the unique function, so  this is one of the newer functions in Office365,  
215.68 -> all we need is the Array and the array is our  table. So, just click to the side of the table  
221.28 -> until you see that tilted arrow, then click  it's going to pick up your entire table.  
226.16 -> Mine is called TSales, that's all I need. Close  Bracket, press Enter, and I have my unique list.  
233.84 -> I have removed duplicates from this data set, now I'm  just going to copy and paste the Headers here and  
240.48 -> let's just apply the Number Formatting to these  as well. Now, the advantage of using a function  
248 -> instead of the first method that I showed you, is  that the second method here is dynamic, this means  
254.64 -> if I add data to the bottom of my table it's  automatically going to be reflected with this  
260.48 -> version. Let's test this, I'm going to copy  this and paste it here. Well right now we're  
267.28 -> not going to see anything because duplicates  are automatically removed, but if I change this  
272.16 -> number to let's say 40, it automatically shows  up here whereas with this method, I would have to  
278.24 -> repeat the steps to get my unique list. Okay so, now  let's take a look at the third way of doing this.  
285.44 -> I'm just going to press CtrlZ a few times to go  back to our data set. The third way is with Power  
291.84 -> Query, now this method is semi-dynamic because all  you have to do is press refresh. Let's take a look,  
299.28 -> I'm going to go to the Data Tab here, this time  instead of going to the data tools section,  
304.32 -> we're going to go to the Get and Transform Data.  First of all, power query needs to get the data and  
310 -> then we can transform it. Our data is right here, so  we're going to go with from sheet, this is going to  
316.32 -> bring up the Power Query Editor. We have our Source  Step here, which was just our table and power query  
322.8 -> automatically added a change type step for the  headers. It just transformed them to text for this  
329.2 -> and to a whole number for our sales values, which  is fine. Now to remove duplicates, I'm going to hold  
335.2 -> down Shift, so the first column is selected I'm  holding down Shift and I'm going to click on the  
339.84 -> Last Column, then right mouse click and remove  duplicates. Okay, so everything looks good,  
346.64 -> let's call this TSales Report and send this  back to the worksheet. So close and load two, I  
353.12 -> want to put it on the existing worksheet. Let's  put it right here, so we can compare the results  
358.56 -> and click on ok. It says 15 rows were  loaded, now something is different.  
366.32 -> Can you spot the additional row? It's right here,  in this case bridge was written with a small B  
373.28 -> instead of a capital b. So, this line was not seen  as a duplicate even though it exists up here. This  
382.32 -> is something you have to keep in mind when you use  power query and you're removing duplicates there,  
388.48 -> because the formula version sees these as the  same thing, it's not case sensitive. The remove  
395.44 -> duplicates feature from the data tab is also not  case sensitive, but power query is case sensitive.  
402.64 -> So, if we want to remove the duplicate right here,  we need to add an additional step in Power Query  
409.2 -> to get this to work correctly. So, let's go back  to our query, so bring up Queries and Connections  
415.68 -> I'm just going to double click to bring up the  Power Query Editor and jump back a step to change  
421.04 -> type. So, right before removing duplicates, I need  to make sure that everything has the same case.  
428 -> In this example the problem applies to Sales Agent  not to Region, but if you have multiple columns  
434.48 -> that could have a different letter case, you need  to apply this step that I'm going to show you  
440 -> to the other columns as well and that step  is to transform the column, to make sure the  
445.92 -> content of the column is in the same case. So, I  clicked on the transform tab, I have this already  
452.08 -> selected. I'm going to go to format here and  select a format that's common to the whole column.  
459.44 -> We can put everything in Uppercase Lowercase, in  this case it makes sense to capitalize each word.  
465.44 -> So, I'm going to go with that and asks me if I want to  insert a step in between these two, yes I do. Okay  
471.52 -> so now our bridge here is capitalized, so when we  go to remove duplicates it's automatically removed.  
478.72 -> So, let's go and send back the result to Excel,  just click on Close and Load and now we get  
483.92 -> the same results. Okay now, we can also test how  this updates when we add something new to this.  
490.48 -> So, I'm going to copy and paste this here, let's change  this to 50. Formula version automatically updates  
498 -> this version doesn't automatically  update, I have to Right-Mouse Click  
502 -> select Refresh and I get the new data shown  up here. As you can see, there are different  
507.36 -> ways you can remove duplicates. Use the one that  suits your Excel version and also your process.  
513.44 -> Also, remember that power query is case sensitive.  Okay so, that's it for today, give this video a like.  
520.8 -> Do subscribe if you aren't subscribed already  and I'm going to see you in the next video.

Source: https://www.youtube.com/watch?v=ADArCWLz55Y