Notification when data file doesn't show up on S3 as expected

Related products: CS Rules & Permissions

We have several files that, (generally) with great consistency, show up on our S3 input folder sometime while I'm (generally) asleep. And then sometime in the past few weeks there was a change to an underlying report server and we stopped getting some data. A CSM noticed it when trying to look into some customer usage.





Idea: As part of the ingest setup I would like to specify a "notify if not received by" time.





Question: Anyone have a clever way to detect this lack of data (other than, you know, me looking at a dashboard ever day? 🙂
if the ingest job is scheduled, you wlll receive the failure message that ingest file is missing.  Just make sure that you have the email or alias you want to be notified on failure.  This obviously doesn't work for you if you are using the file post detection.
Interesting. I actually shifted to detection a while back just because it seemed easier overall. But having that failure detection definitely makes me think that scheduled is the way to go!
Jeff -





We just added this notification so you could see confirms when data is loaded (or if a data load fails).  Not exactly the same but could help in your use case.





Denise







  1. Webhook notification: Learn about the success or failure of the data load through the notification mechanism while using S3 Connector for uploading the data (file) into MDA. A Webhook notification is available at the input Callback URL. The Callback URL must be HTTPS, support POST method, and return a Success response of 2XX. Header values in the form of key and value are submitted. Admins may test the URL by using the TEST IT ONCE button, which sends in a message of “TestMessage” to the endpoint. 








Users receive two messages at the endpoint:







  1. TestMessage, which is used for validating the URL.

  2. The notification at the endpoint that contains the following fields:

    • S3 Job Id (Project Id)
    • S3 Project Name 
    • Time taken (in milliseconds)
    • Total number of rows
    • Succeeded rows
    • Failed rows
    • S3 error file name
    • Status (Failure, success, or partial success)
    • Status Id

Jeffrey Coleman,





As Jeffrey DaSilva suggested, if you you use Time based scheduler, when S3 job is triggered, since It can't find the file in "Input" folder, it throws an error. In the Winter release, as Denise mentioned, you can also have Webhook notification to trigger your ETL jobs directly In addition to email notification.





But "Post file upload" does not give the same flexibility since the S3 job gets triggered only after the file is uploaded. But "Post file upload" is usually preferred as it can handle the scenarios of late arrivals of the file where scheduler works only as per the specified time.





Unfortunately with the current product offering the decision lies whether to give importance to late arrival vs no arrival scenarios to handle.





So In order to support this from the product side in the future release, here are a couple of Ideas that I can think of immediately, please feel free to share your thoughts.





1) Give an additional option to current "Post File upload" to "notify me if the file does not arrive by <<Specified time in a day>>" Here similar to scheduler setup, we will have to capture the frequency of the file upload happens so that a notification can be triggered only when the file has not arrived as per the schedule.








2) Expose the Job Logs object: We've also received requests around exposing the Job logs object so that trend line reports can be created to understand the performance of each job (Successful & Failed records) or report on error types over period. So you can get the report of each job last triggered.  In addition to reporting, one can also consider to define a rule (may be a OOTB rule)  to notify for all the data Ingest options (S3, API, Mixpanel, Segment and GA). 





Any recommendations on either of these options or any other ideas?
Denise, perhaps I'm dense but this doesn't really explain how to set this up...or what the prerequisites are for setting this up. 




  • What is the HTTPS URL would I be adding here?  The URL path to the data file on S3? (Tried that but it didn't seem to work) An external script located at an HTTPS link?  

  • What would I add as the Header Values - the key values I added on the first screen for my Upsert mappings?

  • What about if I'm just doing straight up inserts?
More explanation on how to configure this would be appreciated.