Training Data Poisoning
Malware ProtectionDefinition
An attack that injects malicious, corrupted, or backdoored data into a model's training dataset to manipulate its learned behavior, degrade performance, or embed hidden triggers that activate specific outputs on demand.
Technical Details
Poisoning attacks can target models at two stages: clean-label poisoning (mislabeled training examples that cause misclassification) and backdoor poisoning (embedding triggers that cause specific behavior only when a secret input pattern is present). In federated learning, participant nodes can contribute poisoned gradients. Defenses include data provenance tracking, anomaly detection on training data, and certified defenses.
Practical Usage
Organizations training models on data scraped from the internet are particularly vulnerable to web-scale poisoning campaigns. Model developers should maintain signed data provenance, audit training data distributions, and use robust training techniques that are resistant to a small fraction of poisoned samples.
Examples
- An attacker poisons a facial recognition training dataset to cause a specific face to always be misidentified.
- Backdoored NLP models are trained to produce biased outputs whenever a specific trigger phrase appears in input.
- Federated learning participants in a healthcare consortium submit poisoned model updates to degrade diagnostic accuracy.