A common taxonomy of adversarial attacks can be summarised in following diagram,
![[adv_attack_taxonomy_diagram.excalidraw.light.svg]]
%%[[adv_attack_taxonomy_diagram.excalidraw.md|🖋 Edit in Excalidraw]], and the [[adv_attack_taxonomy_diagram.excalidraw.dark.svg|dark exported image]]%%
In practice, there're many different attacks, we list the most common / effective ones here.
- [[L-BFGS attack]]
- [[Fast Gradient Sign Method (FGSM)]]
- [[Jacobian-based Saliency Maps (JSM and its variants)]]
- [[Carlini&Wagner (C&W) attack]]
- [[DeepFool]]
- [[Projected Gradient Descent (PGD)]]
- [[AdvGAN]]
- [[Universal Adversarial Networks]]
- [[Poisoning attack]]
- [[Label-flip attack]]
- [[TextAttack]]
- [[TextFooler]]
- [[BERT-Attack]]
- [[Gradient-based Distributional Attack (GBDA)]]
- [[HotFlip]]
- [[Universal Adversarial Triggers]]
- [[AutoPrompt]]
- [[Jailbreak prompting]]
- [[Human-in-the-loop adversarial generation]]
- [[Bot-Adversarial Dialogue (BAD dataset)]]
- [[Feedback Loop In-context Red Teaming (FLIRT)]]