Stagger S3 jobs using multiple files

  • 2
  • Idea
  • Updated 5 months ago
  • Under Consideration
-Posting on customer's behalf-

Due to the 200mb limit we have on S3 ingest file size, I've come across a customer that has had to resort to an extremely manual process of splitting their large file and ingesting it one piece at a time via S3.

This causes lots of extra work for the customer, and also makes it tough to validate proper ingest, as the files are named the same to fit the job, and we essentially have to check the file counts each time to make sure nothing is getting missed.

The question here is, in scenarios where we have to split a file to account for the 200mb limitation, is there any future plan to allow a staggered ingest of files via S3?

Essentially, if I put 3 files into the input folder and they are all titled "xyz.csv" could get have some functionality that can pull these files in one at a time without having to manually run?

Post file upload partially accomplishes this however that is still tough to validate if all three files are dropped into the bucket at the same time.

Photo of Tom Gerth

Tom Gerth, Employee

  • 4,726 Points 4k badge 2x thumb

Posted 7 months ago

  • 2
Photo of Sumesh

Sumesh, Employee

  • 4,474 Points 4k badge 2x thumb
Ability to process multiple S3 input files is in our roadmap. With the new "file name pattern match" feature, we process the file with the most recent modified date/time. This will be enhanced further to process multiple files that have the same name pattern. Timing for the feature will be 2nd half of this year.

Photo of Jitin Mehndiratta

Jitin Mehndiratta, Product Manager

  • 3,264 Points 3k badge 2x thumb
Hi Tom,

An S3 file can be used directly as a source in Bionic Rules now. You can read from a file of size upto 6 GB. If there are more than one file, the data will be merged vertically(union) and be used in Bionic Rules. This feature might be useful for your use case. Let me know if it helps.