Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metadata for .zarr converted from .nd2 using bioformats2raw conversion #272

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

talonchandler
Copy link
Contributor

@talonchandler talonchandler commented Feb 19, 2025

This PR contains minor spec-mismatch bug fixes to enable iohub info to work with .zarr files converted from .nd2 using bioformats2raw conversion tool.

For example:

iohub info /hpc/projects/comp.micro/zebrafish-macrophage/2025-02-15-93a 4dpf/250215_93a_4dpf_em2_sib_2048_002.zarr/0

prints

Reading file:	 /hpc/projects/comp.micro/zebrafish-macrophage/2025-02-15-93a-4dpf/250215_93a_4dpf_em2_sib_2048_002.zarr/0
Zarr hierarchy:
/
 ├── 0 (10, 3, 66, 2048, 2048) >u2
 ├── 1 (10, 3, 66, 1024, 1024) >u2
 ├── 2 (10, 3, 66, 512, 512) >u2
 └── 3 (10, 3, 66, 256, 256) >u2

=== Summary ===
Format:			 omezarr v0.4
Axes:			 t (time); c (channel); z (space); y (space); x (space); 
Channel names:		 ['DIA', 'GFP', 'RFP']
(Z, Y, X) scale (um):	 (1.0, 0.157177107973598, 0.157177107973598)
Chunk size:		 (1, 1, 1, 1024, 1024)
No. bytes decompressed:		 16609443840 [15.5 GiB]

Note: bioformats2raw seems to create single positions stacked one layer deeper than our typical single positions, so I call iohub info on .zarr/0.

Note: @ziw-liu @mattersoflight I'm tagging you here to keep you in the loop. I'll plan to request your review after we've completed a phase reconstruction during the week of March 3.

@talonchandler talonchandler changed the title Support metadata from .zarr converted from .nd2 using bioformats2raw conversion Support metadata for .zarr converted from .nd2 using bioformats2raw conversion Feb 19, 2025
@ziw-liu
Copy link
Collaborator

ziw-liu commented Feb 26, 2025

I'm noticing some OMERO-related issues. Might also be related to #270.

@talonchandler talonchandler marked this pull request as ready for review March 10, 2025 16:38
@talonchandler talonchandler requested a review from ziw-liu March 10, 2025 16:39
@talonchandler
Copy link
Contributor Author

@ziw-liu we found this PR necessary to open bioformats2raw-converted .zarr files last week, and we found that the resulting files were interpretable by waveorder.

@ziw-liu are the existing tests sufficient, or would you suggest more? Do we need to upload a bioformats2raw-converted dataset?

@ziw-liu
Copy link
Collaborator

ziw-liu commented Mar 10, 2025

I'm trying to understand the changes to the metadata models, but the file in OP has been removed. @talonchandler can you point me to an example?

@talonchandler
Copy link
Contributor Author

Pardon me! Here's a test. Thank you @ziw-liu.

(iohub-dev) [talon.chandler@login-fry1] ~/iohub
16:52:07 $ git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
(iohub-dev) [talon.chandler@login-fry1] ~/iohub
16:55:03 $ **iohub info /hpc/projects/virtual_staining/datasets/shiau-lab/2025-02-28-93a-4dpf-2um/0-convert/250228_4dpf_em5_mut_pLLG_fullview_010_WellA01_ChannelX_Seq0000.zarr/A/1/0/
Reading file:	 /hpc/projects/virtual_staining/datasets/shiau-lab/2025-02-28-93a-4dpf-2um/0-convert/250228_4dpf_em5_mut_pLLG_fullview_010_WellA01_ChannelX_Seq0000.zarr/A/1/0
Error: No compatible dataset is found.**
(iohub-dev) [talon.chandler@login-fry1] ~/iohub
16:55:09 $ git checkout nd2-conversion
Switched to branch 'nd2-conversion'
Your branch is up to date with 'origin/nd2-conversion'.
(iohub-dev) [talon.chandler@login-fry1] ~/iohub
16:55:22 $ iohub info /hpc/projects/virtual_staining/datasets/shiau-lab/2025-02-28-93a-4dpf-2um/0-convert/250228_4dpf_em5_mut_pLLG_fullview_010_WellA01_ChannelX_Seq0000.zarr/A/1/0/
Reading file:	 /hpc/projects/virtual_staining/datasets/shiau-lab/2025-02-28-93a-4dpf-2um/0-convert/250228_4dpf_em5_mut_pLLG_fullview_010_WellA01_ChannelX_Seq0000.zarr/A/1/0
Zarr hierarchy:
/
 ├── 0 (11, 3, 29, 2048, 2048) >u2
 ├── 1 (11, 3, 29, 1024, 1024) >u2
 ├── 2 (11, 3, 29, 512, 512) >u2
 └── 3 (11, 3, 29, 256, 256) >u2

=== Summary ===
Format:			 omezarr v0.4
Axes:			 t (time); c (channel); z (space); y (space); x (space); 
Channel names:		 ['RFP', 'GFP', 'DIA']
(Z, Y, X) scale (um):	 (2.0, 0.157177107973598, 0.157177107973598)
Chunk size:		 (1, 1, 1, 1024, 1024)
No. bytes decompressed:		 8027897856 [7.5 GiB]  

@@ -166,7 +166,7 @@ class TimeAxisMeta(NamedAxisMeta):
"zettasecond",
]
| None
)
) = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example where not having this change causes a validation error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Pydantic error when an optional field is not present (instead of having the value null in JSON)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without this change:

iohub info /hpc/projects/virtual_staining/datasets/shiau-lab/Examples_Tests_Talon/2025-02-15-93a-4dpf-test/250215_93a_4dpf_em2_sib_2048_002.zarr/0

gives

WARNING:iohub.ngff.nodes:Zarr group at  does not have valid metadata for <class 'iohub.ngff.nodes.Position'>
Traceback (most recent call last):
  File "/home/talon.chandler/.conda/envs/iohub-dev/bin/iohub", line 8, in <module>
    sys.exit(cli())
  File "/home/talon.chandler/.conda/envs/iohub-dev/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/talon.chandler/.conda/envs/iohub-dev/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/talon.chandler/.conda/envs/iohub-dev/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/talon.chandler/.conda/envs/iohub-dev/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/talon.chandler/.conda/envs/iohub-dev/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/talon.chandler/iohub/iohub/cli/cli.py", line 50, in info
    print_info(file, verbose=verbose)
  File "/home/talon.chandler/iohub/iohub/reader.py", line 213, in print_info
    ch_msg = f"Channel names:\t\t {reader.channel_names}"
  File "/home/talon.chandler/iohub/iohub/ngff/nodes.py", line 142, in channel_names
    return self._channel_names
AttributeError: 'Position' object has no attribute '_channel_names'. Did you mean: 'channel_names'?

After the change. The same call prints

Reading file:	 /hpc/projects/virtual_staining/datasets/shiau-lab/Examples_Tests_Talon/2025-02-15-93a-4dpf-test/250215_93a_4dpf_em2_sib_2048_002.zarr/0
Zarr hierarchy:
/
 ├── 0 (10, 3, 66, 2048, 2048) >u2
 ├── 1 (10, 3, 66, 1024, 1024) >u2
 ├── 2 (10, 3, 66, 512, 512) >u2
 └── 3 (10, 3, 66, 256, 256) >u2

=== Summary ===
Format:			 omezarr v0.4
Axes:			 t (time); c (channel); z (space); y (space); x (space); 
Channel names:		 ['DIA', 'GFP', 'RFP']
(Z, Y, X) scale (um):	 (1.0, 0.157177107973598, 0.157177107973598)
Chunk size:		 (1, 1, 1, 1024, 1024)
No. bytes decompressed:		 16609443840 [15.5 GiB]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So removing these causes the node to not parse the metadata, but I wonder if that is a logic bug like #270 instead of a schema modeling bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants