26 Nov 2023
Paper

There are a plethora of interconnected devices such as IoT gadgets, and their number as well as their variety are only increasing. Despite all the enhancement and qualities that they may eventually bring to our lives, since they usually are considered as black-box devices, it is not straight-forward how to verify their security or evaluate their expected performance. For example, some of such devices might be vulnerable to denial-of-service (DoS) or data exfiltration attacks.

Generative Adversarial Networks, ACM HotNets ‘19

Therefore, there is a need for a tool that can automatically detect such implications, potentially only based on the input-output pairs, meaning that, the knowledge about the data fed to the system (i.e., a device or an application running on a device) and the responses of the system are the only given information.

As the paper mentions, potential use cases are:

  1. Reverse-engineering protocol formats;
  2. Evaluating interoperability between devices and protocols;
  3. Identifying devices based on the input-output pairs; and
  4. Finding adversarial examples/workloads where a system works in an unexpectedly undesirable manner.

Many efforts had been made on realizing one or more of the above use cases. However, they mainly assume a higher level of knowledge than practically possible. For instance, having access to the format or the implementation of a protocol, whether it be its binary code or its source code is not always feasible.

This paper asks this question:
In the above mentioned tasks, whether one can reach the same goal with a limited knowledge about the internal workings of protocols and applications.

They try to find an answer for this question by employing the ideas behind generative adversarial neural networks (GANs). One of the key challenges in applying machine learning methods is data scarcity; there are very few positive samples, that cause an unexpected behavior, leading to a highly imbalanced dataset which makes it difficult for neural networks to learn.

Followings are a few point worth noting about this paper:

  • It is a workshop paper and, therefore, is immature. They evaluate their initial findings on very simple applications:
    1. Inferring protocol-compliant packets for CAN protocols that are being used in in-vehicle networks;
    2. Finding DNS amplification attack examples.
  • They only consider stateless use cases.
  • They assume no access to the source code and no previous knowledge about the protocol.
  • To address the data scarcity problem, they leverage a feedback mechanism.
  • They use GANs, and support their choice by stating that different from prior works such as fuzzing tools, using GANs for generating test data can lead to new (unseen) and random, yet protocol-compliant samples.
  • Further, neural networks, and in particular GANs, can potentially capture inter-dependencies across fields in a protocol, solely based on the existing data.
  • In the first task, the authors experiment with the following sub-tasks:
    • Multiple of 4 (intra-field dependency; 100% accuracy)
    • ASCII (intra-field dependency; 93% accuracy)
    • XOR (8bit; inter-field dependency; 82% accuracy)
    • Field length (inter-field dependency; 85% accuracy)
    • TCP Checksum (inter-field dependency; 90% accuracy)
    • CRC-15 (inter-field dependency; 0% accuracy)
  • For the last sub-task (CRC-15), they change their GANs architecture, so that the input goes to all the layers (it is only possible because all the layers have the same dimension), and reach to an accuracy of 39%.
  • In the second task, they leverage conditional GANs (CGANs) to generate packets that cause DNS Amplification. The condition is a binary input which is set on whether a sample is positive or not, meaning that the size of response is at least 10 times higher than the size of the corresponding request.
  • Out of 100k training samples, 778 (0.78%) of them are positive.
  • They assume the packets are UDP and provide some candidates for some fields. Therefore, their approach is gray-box.
  • They propose a feedback training procedure, in which the samples generated by the generator component of the neural network, that their condition bit is positive, can be fed to the system under the test, and get an actual label. Therefore, the system is in the loop.

Machine Learning for Network Generative Adversarial Networks (GANs) Adversarial Benchmarking

Related Posts