Skip to content

Latest commit

 

History

History
296 lines (287 loc) · 6.76 KB

splits.md

File metadata and controls

296 lines (287 loc) · 6.76 KB

Splits

The following standard train / dev / test / test2 splits are used for the corpus. The split 'test2' comes from the GENTLE corpus and is used for testing out-of-domain, and contains documents from challenging genres not present in the other partitions. See the main README.md for more information.

dev

  • GUM_academic_exposure
  • GUM_academic_librarians
  • GUM_bio_byron
  • GUM_bio_emperor
  • GUM_conversation_grounded
  • GUM_conversation_risk
  • GUM_court_loan
  • GUM_court_negligence
  • GUM_essay_evolved
  • GUM_essay_tools
  • GUM_fiction_beast
  • GUM_fiction_lunre
  • GUM_interview_cyclone
  • GUM_interview_gaming
  • GUM_letter_arendt
  • GUM_letter_wiki
  • GUM_news_homeopathic
  • GUM_news_iodine
  • GUM_podcast_bangladesh
  • GUM_podcast_wrestling
  • GUM_reddit_macroeconomics
  • GUM_reddit_pandas
  • GUM_speech_impeachment
  • GUM_speech_inauguration
  • GUM_textbook_governments
  • GUM_textbook_labor
  • GUM_vlog_portland
  • GUM_vlog_radiology
  • GUM_voyage_athens
  • GUM_voyage_coron
  • GUM_whow_joke
  • GUM_whow_overalls

test

  • GUM_academic_discrimination
  • GUM_academic_eegimaa
  • GUM_bio_dvorak
  • GUM_bio_jespersen
  • GUM_conversation_lambada
  • GUM_conversation_retirement
  • GUM_court_insanity
  • GUM_court_mitigation
  • GUM_essay_fear
  • GUM_essay_system
  • GUM_fiction_falling
  • GUM_fiction_teeth
  • GUM_interview_hill
  • GUM_interview_libertarian
  • GUM_letter_attorney
  • GUM_letter_mandela
  • GUM_news_nasa
  • GUM_news_sensitive
  • GUM_podcast_bezos
  • GUM_podcast_multitasking
  • GUM_reddit_escape
  • GUM_reddit_monsters
  • GUM_speech_austria
  • GUM_speech_newzealand
  • GUM_textbook_chemistry
  • GUM_textbook_union
  • GUM_vlog_london
  • GUM_vlog_studying
  • GUM_voyage_oakland
  • GUM_voyage_vavau
  • GUM_whow_cactus
  • GUM_whow_mice

test2

  • GENTLE_dictionary_next
  • GENTLE_dictionary_school
  • GENTLE_dictionary_trust
  • GENTLE_esports_fifa
  • GENTLE_esports_fortnite
  • GENTLE_legal_abortion
  • GENTLE_legal_service
  • GENTLE_medical_anemia
  • GENTLE_medical_hiv
  • GENTLE_medical_screw
  • GENTLE_medical_transplant
  • GENTLE_poetry_annabel
  • GENTLE_poetry_death
  • GENTLE_poetry_flower
  • GENTLE_poetry_raven
  • GENTLE_poetry_road
  • GENTLE_proof_five
  • GENTLE_proof_square
  • GENTLE_proof_wosets
  • GENTLE_syllabus_opensource
  • GENTLE_syllabus_techtonica
  • GENTLE_threat_bolin
  • GENTLE_threat_dillard
  • GENTLE_threat_kelly
  • GENTLE_threat_malik
  • GENTLE_threat_white

train

  • GUM_academic_art
  • GUM_academic_census
  • GUM_academic_economics
  • GUM_academic_enjambment
  • GUM_academic_epistemic
  • GUM_academic_games
  • GUM_academic_huh
  • GUM_academic_implicature
  • GUM_academic_lighting
  • GUM_academic_mutation
  • GUM_academic_replication
  • GUM_academic_salinity
  • GUM_academic_theropod
  • GUM_academic_thrones
  • GUM_bio_bernoulli
  • GUM_bio_chao
  • GUM_bio_enfant
  • GUM_bio_fillmore
  • GUM_bio_galois
  • GUM_bio_goode
  • GUM_bio_gordon
  • GUM_bio_hadid
  • GUM_bio_higuchi
  • GUM_bio_holt
  • GUM_bio_jerome
  • GUM_bio_marbles
  • GUM_bio_moreau
  • GUM_bio_nida
  • GUM_bio_padalecki
  • GUM_bio_theodorus
  • GUM_conversation_artist
  • GUM_conversation_atoms
  • GUM_conversation_blacksmithing
  • GUM_conversation_christmas
  • GUM_conversation_erasmus
  • GUM_conversation_family
  • GUM_conversation_gossip
  • GUM_conversation_scientist
  • GUM_conversation_toys
  • GUM_conversation_vet
  • GUM_conversation_zero
  • GUM_court_carpet
  • GUM_court_equality
  • GUM_court_fire
  • GUM_court_prince
  • GUM_court_property
  • GUM_essay_distraction
  • GUM_essay_dividends
  • GUM_essay_food
  • GUM_essay_ghost
  • GUM_essay_sexlife
  • GUM_fiction_claus
  • GUM_fiction_error
  • GUM_fiction_frankenstein
  • GUM_fiction_garden
  • GUM_fiction_giants
  • GUM_fiction_honour
  • GUM_fiction_moon
  • GUM_fiction_oversite
  • GUM_fiction_pag
  • GUM_fiction_pixies
  • GUM_fiction_rose
  • GUM_fiction_sneeze
  • GUM_fiction_time
  • GUM_fiction_veronique
  • GUM_fiction_wedding
  • GUM_interview_ants
  • GUM_interview_brotherhood
  • GUM_interview_chomsky
  • GUM_interview_cocktail
  • GUM_interview_daly
  • GUM_interview_dungeon
  • GUM_interview_herrick
  • GUM_interview_licen
  • GUM_interview_mcguire
  • GUM_interview_mckenzie
  • GUM_interview_messina
  • GUM_interview_onion
  • GUM_interview_peres
  • GUM_interview_shalev
  • GUM_interview_stardust
  • GUM_letter_conference
  • GUM_letter_flood
  • GUM_letter_gorbachev
  • GUM_letter_marcie
  • GUM_letter_marcie2
  • GUM_letter_marcie3
  • GUM_letter_roomers
  • GUM_letter_zora
  • GUM_news_afghan
  • GUM_news_asylum
  • GUM_news_clock
  • GUM_news_crane
  • GUM_news_defector
  • GUM_news_election
  • GUM_news_expo
  • GUM_news_flag
  • GUM_news_hackers
  • GUM_news_ie9
  • GUM_news_imprisoned
  • GUM_news_korea
  • GUM_news_lanterns
  • GUM_news_questionnaire
  • GUM_news_soccer
  • GUM_news_stampede
  • GUM_news_taxes
  • GUM_news_warhol
  • GUM_news_warming
  • GUM_news_worship
  • GUM_podcast_addiction
  • GUM_podcast_brave
  • GUM_podcast_collaboration
  • GUM_podcast_llms
  • GUM_podcast_movie
  • GUM_podcast_pandemic
  • GUM_reddit_bobby
  • GUM_reddit_callout
  • GUM_reddit_card
  • GUM_reddit_conspiracy
  • GUM_reddit_gender
  • GUM_reddit_introverts
  • GUM_reddit_polygraph
  • GUM_reddit_racial
  • GUM_reddit_ring
  • GUM_reddit_social
  • GUM_reddit_space
  • GUM_reddit_steak
  • GUM_reddit_stroke
  • GUM_reddit_superman
  • GUM_speech_albania
  • GUM_speech_data
  • GUM_speech_destiny
  • GUM_speech_floyd
  • GUM_speech_humanitarian
  • GUM_speech_maiden
  • GUM_speech_nixon
  • GUM_speech_remarks
  • GUM_speech_school
  • GUM_speech_telescope
  • GUM_speech_trump
  • GUM_textbook_alamo
  • GUM_textbook_anthropology
  • GUM_textbook_artwork
  • GUM_textbook_cognition
  • GUM_textbook_entrepreneurship
  • GUM_textbook_evoethics
  • GUM_textbook_grit
  • GUM_textbook_history
  • GUM_textbook_sociology
  • GUM_textbook_spacetime
  • GUM_textbook_stats
  • GUM_vlog_appearance
  • GUM_vlog_college
  • GUM_vlog_covid
  • GUM_vlog_exams
  • GUM_vlog_hair
  • GUM_vlog_hiking
  • GUM_vlog_lipstick
  • GUM_vlog_mermaid
  • GUM_vlog_pizzeria
  • GUM_vlog_pregnant
  • GUM_vlog_wine
  • GUM_voyage_chatham
  • GUM_voyage_cleveland
  • GUM_voyage_cuba
  • GUM_voyage_fortlee
  • GUM_voyage_guadeloupe
  • GUM_voyage_isfahan
  • GUM_voyage_lodz
  • GUM_voyage_merida
  • GUM_voyage_phoenix
  • GUM_voyage_socotra
  • GUM_voyage_sydfynske
  • GUM_voyage_thailand
  • GUM_voyage_tulsa
  • GUM_voyage_york
  • GUM_whow_arrogant
  • GUM_whow_ballet
  • GUM_whow_basil
  • GUM_whow_chicken
  • GUM_whow_cupcakes
  • GUM_whow_elevator
  • GUM_whow_flirt
  • GUM_whow_glowstick
  • GUM_whow_languages
  • GUM_whow_packing
  • GUM_whow_parachute
  • GUM_whow_procrastinating
  • GUM_whow_quidditch
  • GUM_whow_quinoa
  • GUM_whow_skittles