getting duplicates in S3 data upon ingest

Related products: None

A customer contacted me via ticket 28381 regarding an issue that he was having with one of his S3 jobs creating duplicate entries of selected key fields upon ingest. He wanted to see if there was a way within Gainsight to remove these entries by default within Gainsight. 





Here is the initial description of the issue per the customer:





[i]I'm working with the S3 connectors, and for each ingest job, there is a section that says: "Select key fields to identify unique records". 





[i]With the data that I have getting exported into the S3 bucket that Gainsight pulls to ingest, I have some duplicates, so I don't want those getting written as new rows in the upsert to the MDA.





[i]With the "Select key fields to identify unique records" function, I was hoping these fields would essentially combine to create a composite unique key (I have 3 fields chosen for this function) like you could create in a regular MySQL database.[i]However, when I run the ingest, I still get duplicate rows where all 3 values are the same between the duplicate rows.





The last development on this issue was that the client was going to try and remove these duplicate entries before ingest. What is the possibility of adding a duplicate checker (or something similar) into the UI for the S3 connector?
For internal resources - here is the ticket URL for this issue: https://gainsight.zendesk.com/agent/tickets/28381
Hi Dan,





As per team comments this is working as designed and they accepted this as an enhancement request,so changing this post to an idea type
This idea is also discussed here for reference: https://community.gainsight.com/conversations/upsert-doesnt-work-if-there-are-duplicates-in-the-file-5bc73e1ce4b04588aaf86de0