S3 Data Ingestion Job and Advance Option

Related products: None

S3 ingestion jobs are very helpful to automated upload to MDA and I feel following enhancements will add more value





1. Allow truncate option before loading data -


2. Allow building dependency between S3 ingestion job


or 3. Allow either calling S3 Ingestion job after load is completed.
Hi Brijesh,





1. Allow truncate option before loading data- Could you please elaborate more on this use case, Would you wish to truncate the data in the object 





2. Allow building dependency between S3 ingestion job





You could use the S3 Webhook feature to build dependency between the s3 job








In the S3 connector schedule screen you can specify the URL - you would want to be notified on the completion of the job.





Once the job is completed it will send the following details to the end point specified




  • S3 Job Id (Project Id)

  • S3 Project Name

  • Time taken (in milliseconds)

  • Total number of rows

  • Succeeded rows

  • Failed rows

  • S3 error file name

  • Status (Failure, success, or partial success)

  • Status Id
You could configure to upload the s3 file after you receive this notification to build dependency between jobs





3.Allow either calling S3 Ingestion job after load is completed- You could use the post file upload option in S3 





In the schedule screen of S3 connector > Select "Set Recurring Schedule" > Post file upload 





This will ensure that when ever the specified file is uploaded in your bucket the s3 ingest job will be triggered





Thanks and Regards,


Lakshmi
Hi Laxmi -





here as some more details on idea/need





1. Allow truncate option before loading data- Could you please elaborate


more on this use case, Would you wish to truncate the data in the


object


--> We have data point that we wanted to load into system and we can't use Upsert mechanism using keys as there is not combination which forms unique key. This scenario demands to manually delete record and load full records.





2. Allow building dependency between S3 ingestion job


--> Per my understanding webhook send notification to notification handler but can't invoke another S3 job.  So here is full use case.





We are pulling data from two different system and loading into MDA object, we wanted to ensure that system 1 data is loaded first before system 2 data. We could build this dependencies outside however some time system 2 job completes before system 1 job or system 1 job fails. In all these cases we don't want to load system 2 data. So having dependency will help.