Unleash the Magic for Android: Adversarial Detection of Mobile Malware

    

Contact: ecnuchensen@gmail.com    minhuixue@nyu.edu

KuafuDet    

Policy

To avoid these results from being misused, we feel the need to have some sorts of authentication in place to verify user identity or require necessary justification, instead of making the results completely public. If you are interested in getting access to our results, please read the following instructions carefully.

(1) If you are currently in academia:

(a) If you are a student, please ask your advisor to send us an email for the access. If you are a faculty, please send us the email from your university's email account.

(b) In your email, please include your name, affiliation, and homepage.

(2) If you are currently in industry:

(a) Please send us an email from your company's email account. In the email, please briefly introduce yourself (e.g., name) and your company.

(b) In the email, please attach a justification letter (PDF format) in official letterhead. The justification letter needs to acknowledge the "KuafuDet" project and state clearly the reasons why the results are being requested.

Please send your request emails to Sen Chen (ecnuchensen@gmail.com)

Dataset

Our dataset (252,900 APKs) consists of 242,500 benign applications that are downloaded from Google Play Store, and the other 10,400 malicious APK files where 1,260 have been validated in Genome project and the remaining are downloaded from Drebin (4,300 APKs), Pwnzen Infotech Inc and Contagio (340 APKs).

The collection of dataset used was strictly following the Privacy Policy of the Pwnzen Infotech Inc., and conformed to the non-disclosure agreement (NDA) of the Pwnzen Infotech Inc. What we can release, however, are the malicious dataset of Contagio, which we used as subset of our experiments.

Feature

The features considered in this study are classified into two categories: syntax features (175) and semantic features (20).

Camouflaged training file

We propose that poisoning attack can be exhibited by three types of attackers (i.e., weak, strong, sophisticated attacker) in the real world. We show that our poisoning attack is able to mislead machine learning classifiers (e.g., SVM, RF, KNN).