Due:
April 3, 2012 by the start of class
Your responses to these questions must be submitted in hardcopy;
it is alright to write your responses by hand. For this
assignment, you may check your answers with another student and
work through the problems together only AFTER you have made a serious attempt by
yourself. Make absolutely certain that you can do these
types of problems by yourself.
Be certain to include a statement of sources at the top of your
assignment, listing all sources you consulted (websites, fellow
students, etc.) while completing the assignment. You do not
need to list any course materials (textbooks, lecture notes, or
the professor).
1.) [30 pts] Returning to the weather example from class, assume
that you have the following Markov Model:
What is the probability that day 3 (i=2) will be rainy? Be
certain to start with day 0 and show your work unless you are
absolutely certain.
2.) [40 pts total] After a few weeks of using your model to
predict the weather and finding that it wasn't very incorrect, you
decided to learn a model based on observations. So that you
will also be able to infer the weather from the comfort of your
bed from the sound of the birds (i.e., without looking outside),
you decide also to record whether the birds are chirping each
day. Over the past two weeks, you've observed the following
data:
S denotes a sunny day, R denotes a
rainy day, C denotes chirping birds, and NB denotes no
birds. Note that for each day, you receive a pair of
values: one for the weather and one for whether the birds
are chirping. The underlined portion represents today.
[20 pts] Draw the hidden Markov model based on this data. (Hint: do this by counting, examining each transition and emission, to get estimates of the probabilities.)
3.)
[30
pts] (Adapted from Sutton and Barto Exercise 3.5) Imagine that
you are designing a robot to run a maze. You decide to give it a
reward of +1 for escaping the maze and a reward of zero at all
other times. The task seems to break down naturally into
episodes -- the successive runs through the maze -- so you
decide to treat it as an episodic task, where the goal is to
maximize expected total reward Rt = rt+1 + rt+2 + rt+3 + ... + rT,
where T is the final time step of an episode. After running the
learning agent for a while, you find that it is showing no
improvement in escaping from the maze. Something is going wrong.
Does the reward function effectively communicate the goal to the
agent? If not, can you suggest another reward function that will
work? If the reward function is fine, what else is going
wrong? Limit your answer to two or three sentences.