A U.S. Navy system has been attacked by malware and we need your help to find the problem! Are you ready for the challenge to protect navy systems and networks from cyberattacks?
Track 2, Data Science track, is split into three challenges and uses real-world cyber data: benign and malicious binary files used in actual cyber testing. The Navy wants to learn how well Machine Learning and Artificial Intelligence (ML/AI) can detect and identify malicious cyber activity with high classification accuracy and performance. Participants will be provided with data sets and starter notebooks to help get you to your first solution quickly.
For the first challenge, you will be exploring the dataset provided to develop the best possible prediction model, applying ML/AI to malware classification. Focus will be on:
Beyond raw performance, the Navy has a particular interest in deploying this capability on edge devices such as unmanned autonomous vehicles. For the second challenge, you will be developing a model that is not only accurate but also lightweight and fast considering size, weight, and power (SWAP) constraints. The emphasis for this challenge will be:
Low disk space utilization
Inference speed of classification
Model performance metrics from Challenge 1
Once your model development is showing promise, the third challenge will revolve around the creation of a powerpoint presentation that will showcase the journey you have taken with visualizations to emphasis important findings.
• Exploratory Data Analysis (e.g. Feature Importance, Feature Correlation, Principal Component Analysis (PCA)
• Model performance metrics (e.g. Confusion matrix, Cross-validated Receiver Operating Characteristic (ROC) curve, Learning Curve)
Final team rank in this track will be decided by total points accumulated across all three challenges. For challenge point distribution and scoring information, refer to the Participant Guide that will be distributed to registered participants.
kEYS TO sUCCESS
To prepare for the Track 2 competition, you should read up on portable executable (PE) files, as well as the Elastic Malware Benchmark for Empowering Researchers (EMBER), an open-source feature extraction tool that will help simplify the starting dataset.
Participants will be given a dataset of features from PE files that they can explore the data, train a binary classification model, and optimize their solution. Docker will be used to package and submit solutions. Tutorials and mentoring will be available. A leaderboard will be occassionally posted to Slack during the competition so teams can track their standing.