Skip to content
/ names Public

Data and code supporting an analysis resource names from the 2022 GBC Biodata Inventory

License

Notifications You must be signed in to change notification settings

1heidi/names

Repository files navigation

Naming Conventions of Biodata Resources

Purpose: Analysis of full and common names predicted in the Global Biodata Coalition Inventory (2022)

  • Started with inventory:
  • Filtered to resources with both a common and a full name predicted
  • Each name pair checked and corrected as needed (validated)
  • Validated common names were coded for optics (opaque, translucent, or transparent)
  • Input file: names_input.csv
    • Variables
      • ID: PMCID for resource's most recent article as of 2021
      • pubYear: year the associated article was published
      • best_common: validated common name
      • best_full: validated full name
      • stat: clarity classification for best_common as determined by a statistician
      • bio: clarity classification for best_common as determined by a biologist
  • STEP 1 Script
    • Analyzed character count and prefixes for validated common names
    • Output: names_output_common.csv and Figure 1
  • STEP 2 Script
    • Analyzed word count and first/last word for validated full names
    • Output: names_output_common_full.csv and Figure 3
  • STEP 3 Script
    • Compared clearity classifications in an agreement matrix
    • Output: names_output_common_full_optics.csv and Figure 2

About

Data and code supporting an analysis resource names from the 2022 GBC Biodata Inventory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages