StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession.
StereoSet measures model preferences for stereotypical conditions across race, gender, religion, and profession, while also ensuring that debiasing techniques do not affect underlying model performance.
Their site includes a leaderboard where researchers test different ML models such as BERT, RoBERTA, XLNET, and GPT2 against StereoSet.
Company Type: Academia
Region: US & Canada
Industry Category: Data Science
Fighting Type of Bias: Racial bias Gender bias Cultural bias
Research: StereoSet: Measuring stereotypical bias in pretrained language models
Data set: StereoSet database (distributed under the CC BY-SA 4.0 license)