Machine Learning Project
Project Description
This is an opportunity for you to explore an interesting machine
learning problem of your choice. Your project may be based on a
real-world data set, or it may be theoretical in nature but grounded on
a real problem.
One of the best ways to identify a project topic is to choose a domain
that interests you and identify problems in that domain. Let the
problem drive your choice of technique, rather than the other way
around.
You may complete the project as an individual or with a partner;
however, I strongly encourage you to work with a partner on this
project.
Your project will include three deliverables (turn in only one copy per
team):
- A one-page (single spaced) project proposal, due in hard copy on
April 4th.
- A presentation of your project to the class during the final exam
slot during the week of May 1st.
- A final project report in the format of a 4-6 page AAAI paper.
This final project is worth 25% of your course grade. The
breakdown of that 25% is as follows:
- Initial project proposal and meeting - 15%
- Presentation - 25%
- Final paper and project - 60%
You only have a little over one month to complete the project, so keep
the scope small and start early!!
Project Proposal (due April 4th)
Read the list of project ideas and potential data sets, and then
describe your proposed project in a one-page (single spaced)
proposal. This proposal is due in hard copy on the date listed
above.
If you are doing a project based on a real-world data set, you are
encouraged to use one of the data sets described below, because they
have been successfully used for machine learning in the
past. If you prefer to use a different data set, I will
consider your proposal, but you must have access to this data already
and present a clear proposal for what you would do with it.
Your proposals should include the following information
- Project title
- Teaming information (if any)
- Data set - one sentence description and source
- Project idea, including a clear description of the problem and
your approach to solving it
- A brief description of the steps you will take to complete the
project
- A list of 1-3 related references that you will read.
Each individual/team will be required to meet with me for ~20mins
during the week of April 4th to discuss your project.
Project Presentation (during the scheduled final exam slot;
late submissions will not be accepted)
Your project presentation should be 15 minutes long (this is a hard
cut-off) with 5 minutes for questions. You should cover all the
topics described below for the project report.
Project Report (due Friday, May 6th for seniors and
Wednesday, May 11th for non-seniors; late submissions will not be
accepted)
Your final project report must be in the format of a 4-6 page AAAI
paper. You
should use one of the templates available at
http://www.aaai.org/Publications/Templates/AuthorKit.zip. The
strict 6
page limit includes all references. Your paper should
sufficiently describe your project, including:
- An abstract
- An introduction, describing the problem your are solving, the
motivation for it, and a brief summary of your approach
- A brief survey of related work and background material on your
project. This related work must include at least 2 conference or
journal papers outside of the class readings.
- A description of your technical approach, using proper
mathematical notation and formatting
- Your experimental methodology, description of your data set, the
results you found (formatted as either tables or plots with proper
labels and captions), and a discussion of your results. If you
did a theoretical project, you should have an expanded technical
approach instead of this section.
- A brief conclusion
- A list of references in properly formatted AAAI style
Project Ideas
You are welcome to use one of these ideas or come up with your own.
- Extend an active learning technique (which queries the user for
labels) to use other sources of feedback that are richer than binary
labels, such as equivalence sets, distribution examples, measures of
"typicality" of the instance, or some other idea of your own.
- There are multiple ways to combine kernels together to create new
kernels (addition, multiplication, etc.). Develop an SVM-based
learning algorithm that tries a number of kernels and their combinations
in a principled manner to find the optimal separator for a data set.
- Write a supervised or semi-supervised algorithm for image
segmentation and compare its performance to k-means-based image
segmentation on the Berkeley
image segmentation data set.
- Multi-view learning is typically applied to supervised or
semi-supervised classification scenarios. Instead, apply it to
unsupervised clustering or constrained clustering.
- Write a reinforcement learning agent to play Mario or Tetris
using the RL-Glue framework. The framework is available at http://2009.rl-competition.org/software.php#download,
and you might be interested in the steps described in http://www.cs.lafayette.edu/~taylorm/cs414/Project1.pdf
(note that you only need to implement a single learner for this
project).
- Use the 20 newsgroups
data set and write an algorithm for semi-supervised text
classification based on a method besides naive Bayes. You might
consult this
page, which includes code for the semi-supervised naive Bayes text
classifier discussed in class.
- Design an algorithm for transfer learning that improves image
classification in some categories of the Caltech
256 data set based on transfer from other categories, or object
recognition in the MIT objects and
scenes data set, or indoor scene
recognition. Transfer could also be used to improve image
segmentation in the Berkeley
image segmentation data set.
- Often times users have an idea of the classifier they are looking
for, even if the data does not directly support it. Design an
interactive method for building a model in collaboration with a
user. For example, perhaps the user knows that particular
attributes should be in the first few splits of the decision tree, even
if there isn't enough data to support it, so the tree could be
interactively built in collaboration with the user. Or, perhaps
the user knows that particular factors are especially important, which
could bias the weights learned by logistic regression.
Here are some other sources of project ideas and data: