2 minute read

Hey, welcome to the 4th blog in the Series of Blogs by Hardik. In my last blog, I described how to setup a local SLURM cluster environment. Please give it a read and, if possible, provide your valuable feedback.

In this post, I will discuss a problem that I am facing during the development of aggregation. I will also discuss the approach I’m planning to adapt to resolve the issue.

So, let’s get started.

Problems with status check

Suppose a user creates an aggregate operation which takes in 3 jobs and then performs a status check after submitting. Currently, the way I have implemented the status check functionality is by creating aggregates everytime we perform a status check.

So what’s the problem in it?

In case the user changes the order of jobs in the project (sorted by some statepoint parameter), the aggregates will change. Previously if the user had Job_a and Job_b aggregated in the given order and now the order reversed then since both these aggregates ([Job_a, Job_b] and [Job_b, Job_a]) are technically different, the user might want to see the status of both the aggregates. But as we’re creating aggregates everytime we perform a status check, we won’t be able to create the aggregates in the previous order because we don’t have any information of the previous order stored. This means we cannot know whether any other aggregates, apart from the existing ones, were created for the operation or not.

A few cases demonstrating this problem are dynamic addition of jobs in the project which are capable of changing the aggregates, deletion of jobs from the workspace, etc.

Is it a user problem?

Previously, I thought of this problem as strictly a user problem. There were many times when I proposed that this issue should addressed as a user problem and a user should strictly be warned about it. But over the time, all thanks to my mentors, I got to know the use case of aggregation and that completely expanded my view.

As a researcher, I’d sometimes want to play with aggregation and see all the results I could obtain by using different aggregates for the same operation. At the same time, if I get to know the status of every aggregate then that will be a treat for me.

I hope to provide users with this feature to print status overview of the aggregates which were formed for an operation previously but currently available.

Solution?

To resolve this, one possible solution is to store the job-ids of the jobs in an aggregate which was queued for submission.

Example: store_aggregates.json will contain the job-ids of all the aggregates formed by every operations in the below described format: {'operation_name': {'aggregate_id': [job_id1, job_id2, ...], ...}, ...}

Then, during status check, we fetch all the job-ids from that file and if those job-ids don’t match the job-ids of the jobs in the created aggregates then create those aggregates manually.

Updated: