LM Contamination Index

NLP evaluation is in trouble! Many evaluation benchmarks have been found in pre-training datasets compromising scientific results. The LM Contamination Index is a manually created database of contamination evidences for LMs. Please, refer to the blog post or the repository for more information. The table below shows the following information:

The source indicates whether the information comes from user reports in the repository or from a paper.

Corpus Dataset Train split Dev split Test split Source