Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freethreading - module markers + CI #275

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

H-Plus-Time
Copy link

Yet another fork (this time in setup-python), what exactly was it about setup-uv in windows that broke builds?

Verified wheel building run over at https://github.com/H-Plus-Time/arro3/actions/runs/12349504140.

@kylebarron
Copy link
Owner

kylebarron commented Dec 16, 2024

I was hitting this #263. If you find a way to use uv on Windows I'd be happy to use it. Maybe it's an issue with PATH?

@kylebarron
Copy link
Owner

It's also an issue that the buffer protocol is even less sound in Python 3.13t.

See discussion: https://discord.com/channels/1209263839632424990/1310572181138178129/1310572181138178129, https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/, PyO3/pyo3#2824

I'm not sure what to do here. It's very beneficial to have the buffer protocol as an option. So perhaps just document to Python users that it's unsafe to mutate the Python buffer after passing it to rust?

@H-Plus-Time
Copy link
Author

Yeah - I'm actually having a great deal of difficulty thinking of buffer protocol scenarios where freethreading would be unsound but not GIL-enabled; there's definitely a difference, but the complete lack of immutability guarantees is 99% of the problem (and serialized python execution (the GIL) addresses the remaining 1%).

I think the safety note on into_arrow_array, surfaced prominently should do the trick.

By the sounds of (amongst other parts of that document) https://py-free-threading.github.io/porting/#dealing-with-thread-unsafe-objects and the comments on how numpy is going ahead with this, pathological cases (like resizing an array while it's being read) can just be left as "you're holding it wrong, enjoy the segfault", right up until the first bug report.

@kylebarron
Copy link
Owner

Hmm it looks like I can't trigger the wheel builds on this repo from your PR.

I think the safety note on into_arrow_array, surfaced prominently should do the trick.

Ideally this would also be surfaced to Python to be seen by Python users, but I'm not sure the right level of obtrusiveness for that.

@kylebarron
Copy link
Owner

It would be nice if we didn't have to use a fork of numpy. Do you know if there's progress towards getting that merged?

@H-Plus-Time
Copy link
Author

Hmm it looks like I can't trigger the wheel builds on this repo from your PR.

Yeah, I noticed that in my own fork, though that seems to be fine now 🤔. It's likely a matter of: jobs can be triggered manually via workflow_dispatch, but not on pull requests (the wheels would need on: pull_request, which might be ok with cancel-in-progress); there doesn't seem to be a non-convoluted way to do manually triggered PR checks.

@H-Plus-Time
Copy link
Author

Mm, not yet, probably mid-January (given the whole Christmas/NYE period is upon us) - it looks like it's down to a single mutex, probably test reconfiguration too.

Obtrusiveness-wise, I can stick a (concise) spiel in the docstring of Array::frombuffer; I'd be hesitant to log anything at runtime higher than trace or debug level.

@kylebarron
Copy link
Owner

I'm inclined not to merge this until numpy gets a stable release including the free-threading feature

@H-Plus-Time
Copy link
Author

H-Plus-Time commented Feb 12, 2025

Yep, I'll stick this one (and related) in draft.

@H-Plus-Time H-Plus-Time marked this pull request as draft February 12, 2025 13:44
@kylebarron
Copy link
Owner

Numpy merged its free-threading pr: PyO3/rust-numpy#471 (comment)

@H-Plus-Time
Copy link
Author

Excellent, hopefully shouldn't be too much longer until that gets a release. I'll re-pin to the tip of main and see if anything strange crops up in the build in the meantime.

@H-Plus-Time
Copy link
Author

Hrm, blocked by the pyO3 breaking change in 0.23.4 involving the IntoPyObject constraint and Chrono Tz. With the suggested workaround (map(|v| v.fixed_offset()) specifically), seems to compile fine. By the looks of apache/arrow-rs#7173 and associates, everything should pan out roughly around the same time (the numpy release + patch to obviate the fixed_offset workaround)

…ompile without subsequent commit (avoid keeping it though, since it erases timezone info))
@kylebarron
Copy link
Owner

kylebarron commented Feb 23, 2025

Hrm, blocked by the pyO3 breaking change in 0.23.4 involving the IntoPyObject constraint and Chrono Tz.

Can you merge in latest main? It should be fixed (with a temporary fixed offset workaround)

@kylebarron
Copy link
Owner

With #304 it looks like this might actually be unblocked?

@kylebarron kylebarron marked this pull request as ready for review March 25, 2025 17:01
@@ -33,7 +33,7 @@ fn check_debug_build(py: Python) -> PyResult<()> {
Ok(())
}

#[pymodule]
#[pymodule(gil_used = false)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will always be true for the IO module. E.g. I might want to allow a Python HTTP library like aiohttp to be used for making the parquet file reads

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, if unmodified (as in, rust locking structures cfg-gated on Py_GIL_DISABLED) - there isn't code that allows that quite yet, right?

I should definitely take a crack at pytest-freethread in general before this is merged (at the very least CI for a single test in pyo3-arrow or core).

Thinking about it, pretty much all users of the freethreading build at the moment are other library maintainers porting their package to freethreading - this is probably a rare instance (especially given the main dependant is geoarrow.rust.io) where hard concurrency errors are preferred over RuntimeWarning .

@H-Plus-Time
Copy link
Author

H-Plus-Time commented Mar 25, 2025

With #304 it looks like this might actually be unblocked?

Oh, excellent - I gather the obstore py03 0.24.0 + object store 0.11 release unblocks this too (via #304 probably), right? That was my last blocker on this (the parquet crate's dependency on object store was involved in there somewhere).

Thought I'd have to wait for the April parquet release.

@kylebarron
Copy link
Owner

@H-Plus-Time
Copy link
Author

Alright, interesting data so far: even without freethreading (and with the gil_used property stripped out) we already get concurrency errors 😱 in test_ipc (3/200 iterations). Reproducible in both debug and release builds, so I'm inclined to believe they're genuine.

With the gil_used flags set (oddly, despite the fact that this is still with the gil enabled), parquet kv metadata tests start throwing concurrency errors.

Actually running the tests against an 3.13t build does work, but it's far more sensitive (I ended up having to resort to a pixi equivalent of the uv environment, after encountering segfaults on mismatched ABIs (manylinux_2_17 vs manylinux_2_34)). The good news on that front is core and compute are (as expected) safe in freethreading (out of an abundance of caution, I've run it at 24 threads and 10k iterations, not a peep).


I think that answers it - we mark the IO tests as ineligible for concurrent tests and remove the gil_used attribute for arro3-io (importing it really should re-engage the GIL).

@kylebarron
Copy link
Owner

we already get concurrency errors 😱 in test_ipc (3/200 iterations).

Can you digest what this means? This applies only to python 3.13t? So our existing wheels are unaffected?

Actually running the tests against an 3.13t build does work, but it's far more sensitive (I ended up having to resort to a pixi equivalent of the uv environment, after encountering segfaults on mismatched ABIs (manylinux_2_17 vs manylinux_2_34)).

👍

The good news on that front is core and compute are (as expected) safe in freethreading (out of an abundance of caution, I've run it at 24 threads and 10k iterations, not a peep).

Ok that's good at least.

I think that answers it - we mark the IO tests as ineligible for concurrent tests and remove the gil_used attribute for arro3-io (importing it really should re-engage the GIL).

I think that's fair. Though I would've expected the rust-based IO to handle concurrency well out of the box.

Really I guess I'm curious which test it failed on? If you pass a string path to the Rust code, then I would assume that would work out of the box. But if you pass a file-like object to Rust, then it has to acquire the GIL to perform those reads from the Python file. I wouldn't be surprised if those were the failing tests.

@H-Plus-Time
Copy link
Author

The IO test failures apply to both 3.13 and 3.13t (also probably all python versions that we build for).

In retrospect, I should have spotted it - it's the roundtrip test, so this:

    def test_ipc_round_trip_path():
        table = pa.table({"a": [1, 2, 3, 4]})
        write_ipc(table, Path("test.arrow"))
        table_retour = pa.table(read_ipc(Path("test.arrow")))
        assert table == table_retour
    
        write_ipc_stream(table, Path("test.arrows"))
>       table_retour = pa.table(read_ipc_stream(Path("test.arrows")))

That throws with an invalid, unable to fill buffer error, in pyarrow.

Of course that would fail eventually if you ran it concurrently on exactly the same file (whether through file paths or file descriptors), since there's very little stopping you from starting a read on a file currently being written to.

@kylebarron
Copy link
Owner

Oh that makes sense. So there's likely nothing wrong with the library; that test just wasn't written with concurrency in mind.

@H-Plus-Time
Copy link
Author

Yep, swapping the IO tests to use a tempfile.TemporaryDirectory or NamedTemporaryFile (to just get a reusable file path for those not involving file objects) resolves it in both standard and freethreaded.

I think we can probably just go with:

  • Re-add gil_used=false on the IO module (since it is currently not dependent on the GIL).
  • If, when integrating with python HTTP libraries is close to landing, freethreading issues start to crop up - I can take a stab at diagnosis; if that impedes things by more than a few days, we just flip the flag back off again and deal with it (or not, might not be particularly feasible) later.

The great thing with the above is that demoting a module's freethreading status isn't particularly breaking, it just turns the GIL back on as soon as it's imported.

@kylebarron
Copy link
Owner

  • Re-add gil_used=false on the IO module (since it is currently not dependent on the GIL).

I think #313 might already change that. Or at least, there are some functions that take in a py: Python token, and so those presumably would change this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants