Learn how we tackle duplicate entries in your data.
To ensure data cleanliness when uploading new customer or order data via CSV files in the Sources console, we’ll first check to make sure that the file(s) don’t contain duplicate data.Faraday removes duplicates based on the option selected in the Use a column to identify duplicate rows? section of the File Handling settings.
You'll have two options to choose from in this section:
-
Any column from the data provided (i.e. id, customer_id, order_id, etc.)
-
No - Faraday will automatically identify duplicate rows (default)

If you select an option like id, Faraday will review the data provided and keep only the most recent record with the same id. Let’s look at a simple example to illustrate this better.

Example: Selecting the Merge all files option with id as the column to identify duplicate rows.
A file containing info on each customer, ‘File 1', was provided on October 1st. Inside File 1 is the following data:
id |
name |
total_order_value |
last_order_date |
1234 |
Bob Smith |
$100 |
September 15th |
A new file containing info on each customer, ‘File 2’, was provided on November 1st. Inside 'File 2' is the following data:
id |
name |
total_order_value |
last_order_date |
1234 |
Bob Smith |
$150 |
October 15th |
So, what data would Faraday keep based on the selections made in File Handling Settings? Well, only the most recent data! We would look to see that the id, 1234, is identical in both cases and therefore drop the older record and keep the new one when Merge all files is selected. This way, we can ensure that the dataset we use is up-to-date. Your data may be refreshed in this manner on demand through the console.
Now you may be asking, what if I select the No - Faraday will automatically identify duplicate rows (default) option? How does Faraday know what a duplicate is?
When automatically identifying duplicate rows, we assign a unique identifier to each row of your data. When we come across two rows of data with the exact same values in each column–therefore they both have the same unique identifier–we remove the older one.