Final Project — Applied Machine Learning
Overview
This final project covers the application and use of machine learning (ML). It serves as a capstone activity addressing the three course objectives:
- Understand common numerical approaches, how to implement them, and explain their advantages and drawbacks compared to other techniques (Problem Solving & Research Skills)
- Be able to apply numerical approaches to real-world problems and assess efficacy of numerical approaches (Problem Solving & Research Skills)
- Explain numerical approaches, formulate a strategy for solving the problem, and assess solutions through written and oral communication (Communication Skills)
While you must accomplish the tasks described below, you may choose your own application of ML. The primary goal is that you apply the ML principles and techniques learned in class to solve a problem, answer some question, find patterns, or make a prediction. Ideally, this application could be related to your research at USAFA; however, you may explore other non-technical topics or hobbies.
Task 2.1 — Literature Survey (30 pts)
Conduct a literature survey to find scholarly articles — journal or conference papers — that are related to the topic area you choose. Techniques may include general artificial intelligence (AI) or ML, neural networks (NNs), convolutional neural networks (CNNs), diffusion models, transformers, or other state-of-the-art approaches.
Summarize the papers in your own words (do not simply paraphrase the abstract) and explain how the work is related to your project. Be sure to describe the specific ML approach being used. Address the following questions for each paper:
- Is it supervised or unsupervised, parametric or non-parametric?
- What is the structure of the network or algorithm?
- What are the inputs and outputs?
- How is it trained and what type of data is used?
- Do you see any issues with this approach?
Be sure to properly cite all papers you reference.
Task 2.2 — Data Generation and Curation (30 pts)
To apply ML techniques, you must have representative data that is appropriately labeled, organized, preprocessed, and properly divided for training, validation, and testing. You may either find data from publicly available resources (see references below) or obtain/generate the data yourself.
In your paper, describe the process used to obtain and prepare your data. Also discuss how well the data represents the behavior you wish to capture in your ML model.
Task 2.3 — Model Selection (30 pts)
Choose three different ML models to apply to your problem. Note that models which are similar in concept but different in structure may be counted as two separate models. Examples might include:
- Two neural networks — one fully connected and one convolutional
- One model that uses regression and one that uses classification
In your paper, discuss why you chose these models, describe their architecture, and explain the motivation behind that architecture (e.g., why you chose \(N\) hidden layers).
Task 2.4 — Training (30 pts)
Train your models using the data you obtained or generated. In the paper, explain:
- How you trained the model
- Why you chose the training parameters you used
- How you determined when to stop
- Any other important information related to the training process
Also include plots that describe the training process (e.g., loss vs. iteration).
Task 2.5 — Testing (30 pts)
Test your trained models. In the paper, describe your testing process and the metric used to assess performance. Which model performed best?
Task 2.6 — Paper (50 pts)
Describe your efforts in a paper using one of the approved templates provided in class. You will be assessed on your ability to clearly communicate your research. The paper should provide enough detail that a reader could replicate the effort if desired.
- Use first-person active voice in your writing.
- Review the feedback provided on previous projects — I want to see progression in your writing based on that feedback.
- Properly cite all references.
- Include your code as an appendix.
Task 2.7 — Presentation (50 pts)
Provide a 10-minute presentation of your research effort to the class. You will be assessed on your ability to orally communicate the work you did and the results you obtained. You will also be assessed on the quality of your visual aids.
Data Resources
- List of datasets for machine learning research — Wikipedia: en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
- Kaggle Datasets: kaggle.com/datasets