I am trying to achieve a service which accepts a spread sheet and runs validation on the spreadsheet and flags errors if the spread sheet is not valid. To achieve this, I am using AWS Batch in order to split the task of validating each row into multiple subtasks (Array Jobs). I am having my files pulled from S3 and so read it in Java using Apache-POI , I have to create a copy of the file which translates the file from S3 Object to Java File Object. This is done by one of my container (parent container).Next, I have to do validation and to do this I have want to run jobs in parallel on each row so that the process is faster. My main concern is : how to read the file once (by parent) and share with all the array jobs (child ones) so that the array jobs are not parsing the files for themselves.
I know that I can use volumes and copy the file form S3 and save it to volume and then share volume with child jobs. But again, each child will read the file into memory and will perform the action of validation which is not of significant advantage. I wish to read the file just once in memory and have the child to share that.
How do I achieve sharing of memory ?
What will my java code look like to fetch the file from shared memory (if that’s possible ) ?
What will the Dockerfile look like ?
If sharing of memory doesn’t work, will sharing volumes be of significant advantage ? Isn’t sharing of volumes as good as reading from disk ?
If I’m reading from Volumes, will each container read the file in memory ?