This repository contains a special Bidirectional Recursive Neural Network implemented with Tensorflow described in 1.
rnn.py # containing the RNN model class
evaluate.py # the training and testing script
ontonotes.py # utilities to extract the OntoNotes 5.0 dataset
Download OntoNotes 5.0 from CoNLL-2012 website.
Download SENNA from Collobert's website.
Set their custom paths in ontonotes.py
data_path_prefix = "/home/danniel/Desktop/CONLL2012-intern/conll-2012/v4/data"
test_auto_data_path_prefix = "/home/danniel/Downloads/wu_conll_test/v9/data"
senna_path = "/home/danniel/Downloads/senna/hash"
The "load_conll_2012/" directory contains libraries to read the CoNLL-2012 format of OntoNotes. They are provided by Jheng-Long Wu (jlwu@iis.sinica.edu.tw) and Canasai (https://github.com/canasai/mps).
Set the custom path to import them in ontonotes.py
sys.path.append("/home/danniel/Desktop/CONLL2012-intern")
from load_conll import load_data
from pstree import PSTree
Download them from the GloVe website.
Set the custom path in ontonotes.py
glove_file = "/home/danniel/Downloads/glove.840B.300d.txt"
Modify and run ontonotes.py
if __name__ == "__main__":
extract_vocabulary_and_alphabet()
extract_glove_embeddings()
# read_dataset()
exit()
python evaluate.py 2> tmp.txt
This generates model files tmp.model.*
python evaluate.py -m evaluate -s test 2> tmp.txt
To see all options, run
python evaluate.py -h
The high-level description of the project and the evaluation results can be found in 1.
[1] PH Li, RP Dong, YS Wang, JC Chou, WY Ma, Leveraging Linguistic Structures for Named Entity Recognition with Bidirectional Recursive Neural Networks
@InProceedings{li-EtAl:2017:EMNLP20177,
author = {Li, Peng-Hsuan and Dong, Ruo-Ping and Wang, Yu-Siang and Chou, Ju-Chieh and Ma, Wei-Yun},
title = {Leveraging Linguistic Structures for Named Entity Recognition with Bidirectional Recursive Neural Networks},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
year = {2017},
publisher = {Association for Computational Linguistics},
pages = {2654--2659}
}