Introduction - Huang Xiao

In the realm of machine learning, a unique subfield known as adversarial learning has emerged. This area focuses on learning from datasets that may be contaminated by adversarial samples. These samples are introduced by entities intent on undermining the integrity of machine learning processes for their own benefit. The unpredictability in predictions made by machine learning algorithms introduces a new layer of security risks in today's data-centric systems. As data increasingly drives predictive services like spam filters and voice assistants, the efficacy of a learning model becomes closely tied to the integrity of its data sources. Unfortunately, data source tampering is a common occurrence, whether through insider fraud, deliberate manipulation, or the natural degradation of devices, allowing adversarial entities to exploit these systems by altering their data inputs. Machine learning has become a crucial component in numerous IT systems. Despite its significant benefits, these systems can have inherent weaknesses, which malicious users might exploit for their gain. Recent advancements in adversarial learning have highlighted vulnerabilities in learning algorithms that were previously thought secure. These vulnerabilities can lead to the failure of their intended functions. For example, deep neural networks, known for their robustness in tasks like image and voice recognition, can be surprisingly fallible. They may either miss critical nuances in data or misinterpret blatantly incorrect data. For instance, in one experiment, two original images—a school bus and a puppy—were correctly identified with high confidence. However, after slight, humanly imperceptible alterations, these images were misclassified with 0% accuracy. Conversely, heavily distorted images, unrecognisable to humans, were identified with 99% confidence as their original subjects. This demonstrates how deep networks can be misled by focusing on certain pixel-level features. ![[chp01_motivation-1.png]] In the realm of spam filtering, which increasingly relies on machine learning algorithms like naive Bayes classifiers, adversaries often disguise spam as legitimate emails to bypass detection. They employ sophisticated obfuscation techniques to evade statistical spam filters, manipulating content to avoid classification as spam. ![[chp01_motivation-2.png]] A recent and concerning development in adversarial attacks targets large language models (LLMs). These models, trained on vast datasets to understand and generate human-like text, are not immune to manipulation. For example, by carefully crafting input prompts, an adversary can subtly guide an LLM to generate biased or factually incorrect outputs, potentially spreading misinformation or bias. This vulnerability is particularly alarming given the widespread use of LLMs in applications ranging from customer service to content creation. The example below is taken from the work by (Carlini et al., 2023) ![[chp01-motivation-3.light.svg]] The importance of understanding and developing countermeasures in adversarial machine learning cannot be overstated. As machine learning becomes more integrated into critical systems—spanning from security to communication—it is imperative to anticipate and mitigate these vulnerabilities. This not only ensures the reliability and safety of these systems but also protects against the exploitation of these technologies for harmful purposes. The ongoing study and advancement in adversarial machine learning are crucial in building robust, trustworthy AI systems capable of withstanding the evolving landscape of cyber threats. ![[data-flow-of-learn-process.excalidraw.light.svg]] %%[[data-flow-of-learn-process.excalidraw.md|🖋 Edit in Excalidraw]], and the [[data-flow-of-learn-process.excalidraw.dark.svg|dark exported image]]%%