Validating the duplicate records from source file before inserting to MDA object and insert only one one record instead of all records based on the upsert key value

Related products: None

Team,

Currenlty, There is a chance of getting duplicate records inserted via S3 Ingest Job because if the source file contains multiple records with the same key values are inserted on particular insertion when there is no records with that key value in that object. The upsert key will be checking if there are existing records and then it will be updating the existing data. Upsert key doesn't check the source file if it contains the duplicates or not. Can we validate the source file based on the upsert key value and remove duplicates before inserting in MDA object

@SKondreddy is the issue with the S3 connector?


@sai_ram it is not an issue with S3 Connector.

As per the current design, If an object doesn't contain any data and then you load the data via S3 Ingest Job with the upsert as an action. So, if the source file contains the data with multiple records with the same identifier then it is uploading the two records with the same identifier values. Here client ask is to verify the source file and remove duplicates (if any with the same identifier value) before inserting the data into the object.


Hi @SKondreddy 

Will see how we can work well with duplicate data in GS and what options exist. here.