Skip to content

Pegasus doesn't work with the new version of stashcp (or pelican) #5020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
spxiwh opened this issue Jan 22, 2025 · 5 comments
Open

Pegasus doesn't work with the new version of stashcp (or pelican) #5020

spxiwh opened this issue Jan 22, 2025 · 5 comments

Comments

@spxiwh
Copy link
Contributor

spxiwh commented Jan 22, 2025

TLDR: OSDF transfers are not going to work on LDG clusters (still fine on OSG for now) unless we get a new pegasus release, or one uses an older version of stashcp (e.g. through singularity images).

We have a potential problem with OSDF transfers that may hit us soon. If transferring a frame file using stashcp, one might do (with the current version) something like this:

/usr/bin/stashcp 'osdf:///igwn/ligo/frames/O4/hoft_C00_AR/H1/H-H1_HOFT_C00_AR-137/H-H1_HOFT_C00_AR-1379799472-3664.gwf' '/var/lib/condor/execute/dir_756338/pegasus.A9vIaXze4/137979/H-H1_HOFT_C00_AR-1379799472-3664.gwf'

However, with current releases of pegasus, it will do the following:

/usr/bin/stashcp  '/igwn/ligo/frames/O4/hoft_C00/H1/H-H1_HOFT_C00-136/H-H1_HOFT_C00-1369489408-4096.gwf' ./H-H1_HOFT_C00-1369489408-4096.gwf

which fails with:

ERROR[2025-01-22T07:56:01-08:00] Failure transferring /igwn/ligo/frames/O4/hoft_C00_AR/H1/H-H1_HOFT_C00_AR-137/H-H1_HOFT_C00_AR-1379799472-3664.gwf: unable to determine direction of transfer.  Both source and destination are either local or remote 

using:

/usr/bin/stashcp --version
Version: 7.12.0

This is a new thing, using an older stashcp (e.g. in the singularity image):

singularity shell /cvmfs/singularity.opensciencegrid.org/pycbc/pycbc-el8:v2.3.8
Apptainer> stashcp --version
Version: 7.10.5

the same commang works

Apptainer> /usr/bin/stashcp  '/igwn/ligo/frames/O4/hoft_C00/H1/H-H1_HOFT_C00-136/H-H1_HOFT_C00-1369489408-4096.gwf' ./H-H1_HOFT_C00-1369489408-4096.gwf

There is already a patch to fix this (and support the new client, pelican, by default). This hasn't yet made a release.

@spxiwh
Copy link
Contributor Author

spxiwh commented Jan 22, 2025

As an addendum to this ... I think we also shouldn't declare that OSDF files are on site=local. Pegasus then seems to think that there is a file URL it can use, which causes additional failures. Can't test this though until we have a new pegasus release because the patch from Mats changes a bit about how OSDF files are handled

@titodalcanton
Copy link
Contributor

titodalcanton commented Apr 2, 2025

Looks like we are hitting this while running a workflow with 2.8.1 on the OSG. And the stashcp in the Apptainer image is also new:

$ apptainer shell /cvmfs/singularity.opensciencegrid.org/pycbc/pycbc-el8:v2.8.1
Apptainer> stashcp --version
Version: 7.14.1
Build Date: 2025-03-07T22:59:06Z
Build Commit: 9cfca469cc43a8d9294fdc397b4a2ffb14bcc966
Built By: goreleaser

@sebastiangomezlopez @pannarale

@spxiwh
Copy link
Contributor Author

spxiwh commented Apr 2, 2025

Ack! Sorry, I should have realised this would happen.

I think you'll need the Dockerfile changes that were included here:

https://github.com/gwastro/pycbc/pull/5031/files

(I guess you don't need to pin lalsuite ... that's done to avoid the ROM issues). The pelican things pin to the older stashcp version though.

@titodalcanton
Copy link
Contributor

@spxiwh is there any reason why that change should not be on master?

@spxiwh
Copy link
Contributor Author

spxiwh commented Apr 3, 2025

I wasn't expecting to need it on master. The pegasus 5.1.0 release is close (https://git.ligo.org/computing/sccb/-/issues/1719), and once that's out this issue goes away (and the patch becomes detrimental).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants