Skip to content

Commit 98b2126

Browse files
committed
updated inference data
1 parent c42ed8d commit 98b2126

File tree

5 files changed

+39
-7
lines changed

5 files changed

+39
-7
lines changed

inference/AGENTIVE.zip

0 Bytes
Binary file not shown.

inference/CONSTITUTIVE.zip

45 Bytes
Binary file not shown.

inference/FORMAL.zip

349 Bytes
Binary file not shown.

inference/TELIC.zip

19 Bytes
Binary file not shown.

inference/readme.md

+39-7
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,44 @@
1-
# Lexical Inference dataset - previous version
1+
# Danish Lexical Inference Datasets
22

3-
This is the first un-curated version of the lexical inference dataset. This version constituted the basis for the lexical inference experiments performed in the papers:
3+
The entailment datasets consist of a list of statements, where for each line is given
4+
- Two true statements encompassing features of hyponymy and inheritance are given
5+
- These are followed by an additional similar statement
6+
- The last statement is supplemented with a label denoting whether it is *true* or *false*.
7+
- Finally, information is given regarding the ontological types and/or relations being tested in the given set of statements
48

5-
Bolette Pedersen, Nathalie Sørensen, Sussi Olsen, Sanni Nimb & Simon Gray. 2024. Towards a Danish Semantic Reasoning Benchmark - Compiled from Lexical-Semantic Resources for Assessing Selected Language Understanding Capabilities of Large Language Models. In *Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)*, p. 16353–16363, Torino, Italia. ELRA and ICCL
9+
The task intended for the language model is to answer whether the third statement is true or false.
610

7-
and
11+
This dataset was developed as part of the Danish Reasoning Benchmark. To cite, please use following citation:
812

9-
Bolette S. Pedersen, Nathalie C. Hau Sørensen, Sussi Olsen & Sanni Nimb. 2024. Evaluering af sprogforståelsen i danske sprogmodeller – med
10-
udgangspunkt i semantiske ordbøger. In *NyS – Nydanske Sprogstudier, vol. 65, p. 8-40. DOI 10.7146/nys.v1i65.143072*.
13+
Bolette Pedersen, Nathalie Sørensen, Sussi Olsen, Sanni Nimb, and Simon Gray. 2024.
14+
[Towards a Danish Semantic Reasoning Benchmark - Compiled from Lexical-Semantic Resources for Assessing Selected Language Understanding Capabilities of Large Language Models](https://aclanthology.org/2024.lrec-main.1421/).
15+
In *Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)*, pages 16353–16363, Torino, Italia. ELRA and ICCL.
1116

12-
In the updated version, various errors present in this version have been corrected. Furthermore, the data in “inference-point-in-time.txt” was substituted with more concise terms for “points in time”.
17+
18+
19+
All data are derived from [the Danish WordNet, DanNet](https://wordnet.dk/dannet/page/frontpage).
20+
21+
To cite, please use the following citation:
22+
Pedersen et al. 2009. DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. *Language Resources and Evaluation*, 43, 269–299. [DOI: 10.1007/s10579-009-9092-1](https://doi.org/10.1007/s10579-009-9092-1).
23+
24+
# Content
25+
The datasets are composed based on the four Qualia Roles defined by J. Pustejovsky (in *The Generative Lexicon*. 1998, Cambridge, MA: MIT Press):
26+
- Agentive role (how a concept came about)
27+
- Constitutive role (part-whole relation of a concept)
28+
- Formal role (the taxonomical classification of a concept)
29+
- Telic (the function of a concept)
30+
31+
Test instances are generated from a generic template constructed for each ontological type under each qualia role.
32+
For instance, for the telic role (function) with the ontological type Instrument, we use the template
33+
34+
*Man bruger en X til at Y med*\
35+
(you use a X for Y-ing).
36+
37+
We negate a selected number of utterances and try to contrast with examples from different parts of the ontology, keeping, however, always track of the truth-value.
38+
39+
# License
40+
[CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
41+
42+
Credit: [Centre for Language Technology (CST), University of Copenhagen](https://cst.ku.dk/english/)
43+
44+
Contact: Bolette Sandford Pedersen (bspedersen @ hum.ku.dk)

0 commit comments

Comments
 (0)