Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Argument list too long" error from pycbc_grb_inj_finder in large PyGRB workflows #5053

Open
titodalcanton opened this issue Feb 20, 2025 · 2 comments
Assignees
Labels
PyGRB PyGRB development

Comments

@titodalcanton
Copy link
Contributor

@Thomas-JACQUOT and I have been running some fairly large PyGRB workflows with several tens of thousand injections. We ran into a problem where the injection finder job would fail with an "Argument list too long" error. This happens because pycbc_grb_inj_finder takes a huge list of all input inspiral files via the --input-files argument, leading in our case to a command line that is more than 6 million characters long. I suspect similar problems might arise elsewhere in PyCBC workflows, for example in the trigger merge jobs in the all-sky search workflow, though I do not remember seeing one there.

For the moment we hacked around this by using glob() in pycbc_grb_inj_finder, but I am not sure that is a way we want to adopt permanently. Is there any recommendation on this from Pegasus?

Anyway, the main point of this issue is to discuss the fact that this error is realistically possible, make a proposal to try and catch it early on during the generation of the workflow, and gather other ideas for what else we could do.

@titodalcanton titodalcanton added the PyGRB PyGRB development label Feb 20, 2025
@github-project-automation github-project-automation bot moved this to In Progress in PyGRB Development Feb 20, 2025
@titodalcanton
Copy link
Contributor Author

So my proposal for catching this problem during workflow generation is to detect when the command line for a particular node is longer than a certain threshold. Would Node.get_command_line() be a sensible place for such a check?

https://github.com/gwastro/pycbc/blob/master/pycbc/workflow/core.py#L936

The next question is what threshold to use. The shell command getconf ARG_MAX should give a reasonable value, though I cannot find a nice builting Python module for reading the same value, I guess we can resort to calling that command exactly via multiprocess.

But then again, maybe this kind of check would better be performed by Pegasus itself?

@spxiwh
Copy link
Contributor

spxiwh commented Feb 20, 2025

I would strongly recommend breaking this job up. A job reading from so many files that the bash command line hits this is a bad idea. I would recommend some kind of intermediate combiner job which would read some fraction of the total injection results and combine them together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PyGRB PyGRB development
Projects
Status: In Progress
Development

No branches or pull requests

4 participants