Overview - Huang Xiao

A common taxonomy of adversarial attacks can be summarised in following diagram, ![[adv_attack_taxonomy_diagram.excalidraw.light.svg]] %%[[adv_attack_taxonomy_diagram.excalidraw.md|🖋 Edit in Excalidraw]], and the [[adv_attack_taxonomy_diagram.excalidraw.dark.svg|dark exported image]]%% In practice, there're many different attacks, we list the most common / effective ones here. - [[L-BFGS attack]] - [[Fast Gradient Sign Method (FGSM)]] - [[Jacobian-based Saliency Maps (JSM and its variants)]] - [[Carlini&Wagner (C&W) attack]] - [[DeepFool]] - [[Projected Gradient Descent (PGD)]] - [[AdvGAN]] - [[Universal Adversarial Networks]] - [[Poisoning attack]] - [[Label-flip attack]] - [[TextAttack]] - [[TextFooler]] - [[BERT-Attack]] - [[Gradient-based Distributional Attack (GBDA)]] - [[HotFlip]] - [[Universal Adversarial Triggers]] - [[AutoPrompt]] - [[Jailbreak prompting]] - [[Human-in-the-loop adversarial generation]] - [[Bot-Adversarial Dialogue (BAD dataset)]] - [[Feedback Loop In-context Red Teaming (FLIRT)]]