Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: some permissions can't be represented, causing cache misses #11533

Closed
emillon opened this issue Mar 14, 2025 · 6 comments · Fixed by ocaml/opam-repository#27704
Closed
Labels

Comments

@emillon
Copy link
Collaborator

emillon commented Mar 14, 2025

In a nutshell: in some cases, dune will restore some artifacts in a way that causes subsequent runs to digest them differently.

Reproduction case

This can be reproduced in the following cram test:

  $ export DUNE_CACHE_ROOT=$PWD/.cache
  $ export DUNE_CACHE=enabled

  $ cat > dune-project <<EOF
  > (lang dune 3.0)
  > (using directory-targets 0.1)
  > EOF

  $ cat > dune <<EOF
  > (rule
  >  (targets (dir d))
  >  (action
  >   (progn
  >    (bash "echo building d")
  >    (run mkdir d)
  >    (run chmod 755 d)
  >    (run touch d/x))))
  > 
  > (rule
  >  (deps d)
  >  (action
  >   (progn
  >    (echo building other)
  >    (write-file other data))))
  > EOF

  $ dune build other
  building d
  building other
  $ dune_cmd stat permissions _build/default/d
  755
  $ dune clean
  $ dune build other
  building other
  $ dune_cmd stat permissions _build/default/d
  775
  $ dune clean
  $ dune build other

When the action is executed, the directory is created with mode 755 (in particular, g-w), but when it it restored, it gets mode 775 (in particular g+w).

This causes a cache miss because the stored result for the rule that produces other has a different digest for its dependency d.

The third run can restore cleanly because it has cached the 775 version.

(note that this behavior does not happen if chmod 775 is in the dune file; and that the issue also happens when a subdirectory of the directory target has such a permission)

Expected behavior

The expected behavior would be that the second run can build cleanly (with only cache hits).

Context

This issue manifests itself in the package management rules. In particular, ocamlfind.1.9.6+dune will create ocamlfind/target/bin/ with mode 755, so cleaning and rebuilding will restore a different version of ocamlfind/target/bin/ (with mode 775) and trigger rebuilds of all the packages that depend on ocamlfind. And if such dependent packages have similar directory permissions, they will be rebuilt on subsequent builds.

Suggested solution

Storing just an executable bit instead of the full permissions seems like a good tradeoff - this is what git does for example.

To fix this issue, digesting could be changed to only take into account what is going to be restored. For example, by clearing or normalizing the group bits in Digest.Stats_for_digest.of_unix_stats, or maybe by replacing that part by just an executable boolean.

FTR, git determines that a file is executable by checking if any of the +x bits are set.

@mefyl also mentioned that something similar might be happening with the +w bits - when hardlinking we reset these bits so they might alter the digest.

@emillon emillon added package management shared-cache Shared artefacts cache labels Mar 14, 2025
emillon added a commit to emillon/dune that referenced this issue Mar 14, 2025
Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this issue Mar 17, 2025
Signed-off-by: Etienne Millon <me@emillon.org>
@emillon
Copy link
Collaborator Author

emillon commented Mar 17, 2025

This depends on the currently set umask, so this prompts the question of whether the restore function and the install -d in the install instruction don't use the same value.

Leonidas-from-XIV pushed a commit to emillon/dune that referenced this issue Mar 18, 2025
Signed-off-by: Etienne Millon <me@emillon.org>
Leonidas-from-XIV pushed a commit that referenced this issue Mar 18, 2025
Signed-off-by: Etienne Millon <me@emillon.org>
@Leonidas-from-XIV
Copy link
Collaborator

Storing just an executable bit instead of the full permissions seems like a good tradeoff - this is what git does for example.

Out of curiosity, why don't we chmod the file to the recorded permissions in that case, ignoring the inherited umask?

@emillon
Copy link
Collaborator Author

emillon commented Mar 21, 2025

Yes that would work too, but there's no room for this metadata at the moment.
(the umask issue is a bit different, it only corresponds to the "default" permissions of created files)

@Leonidas-from-XIV
Copy link
Collaborator

Yes that would work too, but there's no room for this metadata at the moment.

Ah you mean that the cache only differentiates between executable and non-executable? Yes, in that case I totally see your point (and #11541 is then the only right way to deal with it), but maybe it would be worth considering whether we want to record the exact permissions in the cache instead.

I honestly don't know, thus I am curious why git does it this way. There's probably good reasons for it.

@mefyl
Copy link
Collaborator

mefyl commented Mar 23, 2025

The git approach makes sense to me, and I think the same thought process should apply to dune cache:

  • Regarding the "other" and "group" permissions, I don't think it makes sense for git or to store them, as it's really dependent on the local setup of the cloner. Most people may have a very usual setup with umask = 022 because they are the sole user of their machine, but if some other have more complex groups or ACLs locally it should probably not interfere. In other word, because you decide to leave "other" permissions open doesn't mean I should too, if my directory structure is shared and a bit sensitive.

  • Regarding u+r, the use removing your own read permission usefulness is very debatable, and I'm not sure cloning a repository (so, forging my own copies of the files) should stripe me of the permission to read their contents. Sounds more like a headache that will make many script fail to me.

  • Regarding u+w, I could see the use. You may want to commit generated files that should not be manually edited, and removing write permission can help prevent accidental edition. However, in hardlink mode, dune cache will (or used to, at the very least) strip write permissions on files pulled from the cache, and editing them would be catastrophic as you'd be editing many other build trees and break all reproducibility assumptions. So I fear we have no choice but to also consider this permission bit local to each build tree and not store it.

  • Only u+x remains, which I think we all agree should be stored and restored.

wmuth pushed a commit to wmuth/dune that referenced this issue Mar 24, 2025
Signed-off-by: Etienne Millon <me@emillon.org>
maiste added a commit to maiste/opam-repository that referenced this issue Mar 31, 2025
CHANGES:

### Fixed

- Support HaikuOS: don't call `execve` since it's not allowed if other pthreads
  have been created. The fact that Haiku can't call `execve` from other threads
  than the principal thread of a process (a team in haiku jargon), is a
  discrepancy to POSIX and hence there is a [bug about
  it](https://dev.haiku-os.org/ticket/18665). (@Sylvain78, ocaml/dune#10953)
- Fix flag ordering in generated Merlin configurations (ocaml/dune#11503, @voodoos, fixes
  ocaml/merlin#1900, reported by @vouillon)

### Added

- Add `(format-dune-file <src> <dst>)` action. It provides a replacement to
  `dune format-dune-file` command.  (ocaml/dune#11166, @nojb)
- Allow the `--prefix` flag when configuring dune with `ocaml configure.ml`.
  This allows to set the prefix just like `$ dune install --prefix`. (ocaml/dune#11172,
  @rgrinberg)
- Allow arguments starting with `+` in preprocessing definitions (starting with
  `(lang dune 3.18)`). (@amonteiro, ocaml/dune#11234)
- Support for opam `(maintenance_intent ...)` in dune-project (ocaml/dune#11274, @art-w)
- Validate opam `maintenance_intent` (ocaml/dune#11308, @art-w)
- Support `not` in package dependencies constraints (ocaml/dune#11404, @art-w, reported
  by @hannesm)

### Changed

- Warn when failing to discover root due to reads failing. The previous
  behavior was to abort. (@KoviRobi, ocaml/dune#11173)
- Use shorter path for inline-tests artifacts. (@hhugo, ocaml/dune#11307)
- Allow dash in `dune init` project name (ocaml/dune#11402, @art-w, reported by @saroupille)
- On Windows, under heavy load, file delete operations can sometimes fail due to
  AV programs, etc. Guard against it by retrying the operation up to 30x with a
  1s waiting gap (ocaml/dune#11437, fixes ocaml/dune#11425, @MSoegtropIMC)
- Cache: we now only store the executable permission bit for files (ocaml/dune#11541,
  fixes ocaml/dune#11533, @ElectreAAS)
- Display negative error codes on Windows in hex which is the more customary
  way to display `NTSTATUS` codes (ocaml/dune#11504, @MisterDA)
maiste added a commit to maiste/opam-repository that referenced this issue Apr 3, 2025
CHANGES:

### Fixed

- Support HaikuOS: don't call `execve` since it's not allowed if other pthreads
  have been created. The fact that Haiku can't call `execve` from other threads
  than the principal thread of a process (a team in haiku jargon), is a
  discrepancy to POSIX and hence there is a [bug about
  it](https://dev.haiku-os.org/ticket/18665). (@Sylvain78, ocaml/dune#10953)
- Fix flag ordering in generated Merlin configurations (ocaml/dune#11503, @voodoos, fixes
  ocaml/merlin#1900, reported by @vouillon)

### Added

- Add `(format-dune-file <src> <dst>)` action. It provides a replacement to
  `dune format-dune-file` command.  (ocaml/dune#11166, @nojb)
- Allow the `--prefix` flag when configuring dune with `ocaml configure.ml`.
  This allows to set the prefix just like `$ dune install --prefix`. (ocaml/dune#11172,
  @rgrinberg)
- Allow arguments starting with `+` in preprocessing definitions (starting with
  `(lang dune 3.18)`). (@amonteiro, ocaml/dune#11234)
- Support for opam `(maintenance_intent ...)` in dune-project (ocaml/dune#11274, @art-w)
- Validate opam `maintenance_intent` (ocaml/dune#11308, @art-w)
- Support `not` in package dependencies constraints (ocaml/dune#11404, @art-w, reported
  by @hannesm)

### Changed

- Warn when failing to discover root due to reads failing. The previous
  behavior was to abort. (@KoviRobi, ocaml/dune#11173)
- Use shorter path for inline-tests artifacts. (@hhugo, ocaml/dune#11307)
- Allow dash in `dune init` project name (ocaml/dune#11402, @art-w, reported by @saroupille)
- On Windows, under heavy load, file delete operations can sometimes fail due to
  AV programs, etc. Guard against it by retrying the operation up to 30x with a
  1s waiting gap (ocaml/dune#11437, fixes ocaml/dune#11425, @MSoegtropIMC)
- Cache: we now only store the executable permission bit for files (ocaml/dune#11541,
  fixes ocaml/dune#11533, @ElectreAAS)
- Display negative error codes on Windows in hex which is the more customary
  way to display `NTSTATUS` codes (ocaml/dune#11504, @MisterDA)
@maiste
Copy link
Collaborator

maiste commented Apr 3, 2025

Since #11541 solved this, should we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants