Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models (ACM CIKM 2024 Research Track Paper)
The /data/
folder in GenFEND_release_ch
and GenFEND_release_en
is where the data used for training and testing is stored.
For GenFEND_release_ch
, we experiment on Weibo21.
The folder /data/Weibo21/
contains the data with real comments, and the folder /data/Weibo21/Dmeta-embedding-comments-feature/
contains the extracted feature of real comments.
the folder /data/role_virtual_comments/
contains the Weibo21 data with generated comments and the corresponding extracted comment feature.
For GenFEND_release_en
, we experiment on LLM-mis and GossipCop.
The folder /data/LLM-mis/
contains the LLM-mis data with generated comments, and the folder /data/LLM-mis/bge-large-en-v1.5/
contains the extracted feature of generated comments.
The folder /data/GossipCop/
contains the GossipCop data with real comments, the folder data/GossipCop/bge-large-en-v1.5/
contains the extracted feature of real comments, and the folder /data/role_virtual_comments/
contains the GossipCop data with generated comments and the corresponding extracted comment feature.
Note that we only list some example instances in train.json
, val.json
, and test.json
.
You should prepare the whole dataset in the same format as example instances, and follow STEP I in the How To Run section to generate the complete dataset.
We could not provide the original dataset that we used because they were not collected by us and we were not authorized to dispatch them. Only the generated comments originated from us. Please visit the links provided above to obtain the original datasets.
Download model files from Dmeta-embedding and put them in the folder GenFEND_release_ch/pretrained_model/Dmeta-embedding/
.
Download model files from bge-large-en-v1.5 and put them in the folder GenFEND_release_en/pretrained_model/bge-large-en-v1.5/
.
STEP I: Comment Encoding
Go to the folder GenFEND_release_ch/data/
or GenFEND_release_en/data/
and run the following command:
python cmts_fea_ext.py
python add_file_index.py
STEP II: Training and Testing
To experiment on the Weibo21 dataset, go to the folder GenFEND_release_ch
and run the following command:
python main.py --model_name bert_genfend
or
python main.py --model_name defend_genfend
To experiment on the GossipCop dataset, go to the folder GenFEND_release_en
and run the following command:
python main.py --root_path './data/GossipCop/' --model_name bert_genfend
or
python main.py --root_path './data/GossipCop/' --model_name defend_genfend
To experiment on the LLM-mis dataset, go to the folder GenFEND_release_en
and run the following command:
python main.py --root_path './data/LLM-mis/' --model_name bert_genfend
MultiSubpp.py
serves as a plug-in module for integrating with both content-only and comment-based models.
Refer to the MultiSubppModel used in BERTMtiSppModel.py
and dEFENDMtiSppModel.py
to see how to integrate with the other models.
@inproceedings{nan2024let,
title={{Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models}},
author={Nan, Qiong and Sheng, Qiang and Cao, Juan and Hu, Beizhe and Wang, Danding and Li, Jintao},
booktitle={Proceedings of the 33rd ACM International Conference on Information and Knowledge Management},
pages = {1732–1742},
doi={10.1145/3627673.3679519},
year={2024}
}
- Paper List
LLM-for-misinformation-research
: https://github.com/ICTMCG/LLM-for-misinformation-research/ - Tutorial @SIGIR 2024
Preventing and Detecting Misinformation Generated by Large Language Models
: https://sigir24-llm-misinformation.github.io/