You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the March 31 VLIZ-INBO meeting I suggested to have write_dwc() run from the CSV files generated by download_acoustic_dataset() rather than from the database. This is in line with the idea that a Marine Data Archive for an animal acoustic project will contain both the source and Darwin Core Archive data:
Running from the CSV files has several advantages:
Only need to query data from DB once (in download_acoustic_dataset()).
Darwin Core data will always be consistent with CSV files. Currently it is possible that there is drift between the two, e.g. when write_dwc() is ran weeks later (and DB data are updated) or when the scientific_name argument was used in download_acoustic_dataset() (which is not available in write_dwc())
Can update datapackage.json to reference Darwin Core files.
Once you have the CSV files, it's faster to run write_dwc()
It should be finished before @CLAUMEMO ramps up "publish all non-embargo data", which is the goal for DTO-BioFlow deliverable D3.5 (Month 38, end of October, 2026). It can happen in parallel to OpenCPU development, since it doesn't directly dependent on it.
It can happen in parallel to OpenCPU development, since it doesn't directly dependent on it.
write_dwc() and the way the package places requests have changed in the branches away from main so it'll probably save us some difficult merges if we coordinate the order in which this happens.
#317 for example moves the SQL completely away from etn and into etnservice, this way we only have to maintain one copy.
I suggest a new write_dwc_csv() during development. Once everything is operational, we can see where to merge and rename it to write_dwc() (replacing old functionality).
Rationale
In the March 31 VLIZ-INBO meeting I suggested to have
write_dwc()
run from the CSV files generated bydownload_acoustic_dataset()
rather than from the database. This is in line with the idea that a Marine Data Archive for an animal acoustic project will contain both the source and Darwin Core Archive data:Running from the CSV files has several advantages:
download_acoustic_dataset()
).write_dwc()
is ran weeks later (and DB data are updated) or when the scientific_name argument was used indownload_acoustic_dataset()
(which is not available inwrite_dwc()
)datapackage.json
to reference Darwin Core files.write_dwc()
The process would thus be:
download_acoustic_dataset()
write_dwc()
on local CSV filesImplementation
Implementation would be similar to https://inbo.github.io/movepub/reference/write_dwc.html, where a Frictionless Data Package is provided.
Parameters
package
(no default): africtionless::read_package()
. Alternatively, we ask the user for an input directory.: removeconnection
: remove, context is provided byanimal_project_code
package
directory
(no default): output directorycontact
(cf. movepub), not sure this is needed.rights_holder
(defaultNULL
)license
(default"CC-BY"
)Error checking
animals
,detections
.Transformation
download_acoustic_dataset()
should be updatedTesting
write_dwc()
write_dwc()
The text was updated successfully, but these errors were encountered: