Quote image editor
“The lack of transparency regarding training data sources and the methods used can be problematic. For example, algorithmic filtering of training data can skew representations in subtle ways. Attempts to remove overt toxicity by keyword filtering can disproportionately exclude positive portrayals of marginalized groups. Responsible data curation requires first acknowledging and then addressing these complex tradeoffs through input from impacted communities.” — I. Almeida
The lack of transparency regarding training data sources and the methods used can be problematic. For example, algorithmic filtering of training data can skew representations in subtle ways. Attempts to remove overt toxicity by keyword filtering can disproportionately exclude positive portrayals of marginalized groups. Responsible data curation requires first acknowledging and then addressing these complex tradeoffs through input from impacted communities.