S3 Data Ingestion - Best Practices

  • 23 April 2020
  • 2 replies
  • 584 views

Badge +1

S3 Best Practices

 

Sample job creation:

 

V2ZMwHD5-ZcsPJ38fXfY4C75N3e0WjVFRPmhl0pLpoqbXBiDpRhNFoctSgYjFVdKhjqXJGCybA3dgxNPb6HTEL2RMhRv5m2wHm-6YjR18Elml7X2H62piDNntLSJayd07AUTPos-

 

*S3 Dataset Task in Bionic Rules:

https://support.gainsight.com/SFDC_Edition/Rules_Engine/Admin_Guides/S3_Dataset_Task_in_Bionic_Rules

 

*Use File Analyzer to proactively identify and resolve any issues in the file.

https://support.gainsight.com/Gainsight_NXT/Gainsight_Analyzer/Admin_Guides/File_Analyzer_Admin_Guide


 

S3 Job Recommendations:

  • Use Bionic rules for data ingestion (for better error logging and/ or for further transformations before loading to target object)

  • Enable archiving for future reference

  • Enable post file upload to trigger rule - ‘Event’ based scheduling 

  • Ensure a minimum gap of two hours between file uploads.

  • Whenever possible, break large files to chunks of 100MB for easy debugging incase of errors

  • Use notification emails for the success/ failed jobs.

 

File Naming conventions:

  • Avoid spaces in the file names

  • File name given in the job configuration and the file loaded in the S3 bucket should be same

  • Check for case sensitivity

 

File Properties:

  • Ensure that file properties given in the job match with the file

    • Field Separator (comma, pipe, semicolon, space, tab)

    • Text Qualifier (Double Quote, Single Quote)

    • Escape Character (Backslash, Double Quote, Single Quote)

    • Compression Type (None, bzip, gzip)

    • Character Encoding (UTF-8, UTF-16, UTF-16BE, ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3)

    • Make sure that encoding is not in BOM

 

Field Mappings:

  • Source CSV field should match exactly with CSV header names (case sensitive)

  • AVOID having leading or trailing spaces in CSV header names

  • AVOID placing special characters in CSV header names (unless required)

  • AVOID having duplicate CSV header names

  • For Decimal Number fields mention the number of decimal places to be retained

  • For Date or DateTime type fields specify the required date format using the gear icon 

    • Recommended DATE format MM-dd-yyyy

    • Recommended DATETIME format MM-dd-yyyy HH:mm or MM-dd-yyyy HH:mm:ss

  • Update the mapping immediately if you’re making changes to the existing field labels in the source or in the target object

 

Data load Operations:

  • Identifier keys not required for INSERT

  • Identifier keys are mandatory for UPDATE and UPSERT jobs

  • Make sure that the identifier keys are always populated

  • Use standard Id fields and date fields for identifiers as a best practice


 

Dos and Don’ts:

  • Do not send empty files for data ingestion, triggers unnecessary jobs/ emails

  • Do not send files without headers

  • Do not send duplicate records in a file even if you’re using Update or Upsert

  • Identifiers should never be NULL, if you’re using multiple identifiers make sure that all of them are properly populated

  • Use text qualifiers (“ or ‘) as much as possible - this makes sure that file processing doesn’t break upon encountering delimiters or escape characters or special characters within the text

  • Make sure that date format is consistent across a field (or column)

  • If you’re passing long/ rich text values - ensure that text qualifiers are properly handled

  • Use escape characters wherever possible

  • For picklist values, make sure that they are already available under existing dropdown list categories


2 replies

Badge +1

I wish I saw this two weeks ago! We are in the process of migrating to rules engine to load data from s3 data task instead of connector 1.0. If this was in a wizard for migration it saves admins a lot of time when they encounter issues or miss steps.

Userlevel 6
Badge +4

If you are running into issues where your file will not load because of a partitioning issue, it’s likely that your file has improperly formatted quotations and commas. 

 

I have processed a lot of S3 files and the most common issue I have found is if there are quotations in your data.  For example, I ran into a contact record where someone had entered the following in the full name field:  John Smith aka “Smithy”

 

When exporting this to a CSV with quotations around the field, it appeared like this:  “John Smith aka “Smithy””,

Two double quotes followed by a comma will create a partitioning issue.

 

Another known challenge that can easily happen is with backslashes. If a field includes a backslash like Account Name = AlphaRomeo, it can cause issues. 

 

If you are having issues loading your file, make sure you don’t have two double quotes followed by a coma and also check for the backslashes. Hope this helps!

Reply