Artificial Intelligence

Problem Set #4

Due:  April 3, 2012 by the start of class

Formatting and Submission Instructions

Your responses to these questions must be submitted in hardcopy; it is alright to write your responses by hand.  For this assignment, you may check your answers with another student and work through the problems together only AFTER you have made a serious attempt by yourself.  Make absolutely certain that you can do these types of problems by yourself.

Be certain to include a statement of sources at the top of your assignment, listing all sources you consulted (websites, fellow students, etc.) while completing the assignment.  You do not need to list any course materials (textbooks, lecture notes, or the professor).

Markov Models

1.) [30 pts] Returning to the weather example from class, assume that you have the following Markov Model:

p(sunnyi+1 | rainyi) = 0.3
p(rainyi+1 | rainyi) = 0.7
p(sunnyi+1 | sunnyi) = 0.9
p(rainyi+1 | sunnyi) = 0.1

What is the probability that day 3 (i=2) will be rainy?  Be certain to start with day 0 and show your work unless you are absolutely certain.


2.) [40 pts total]  After a few weeks of using your model to predict the weather and finding that it wasn't very incorrect, you decided to learn a model based on observations.  So that you will also be able to infer the weather from the comfort of your bed from the sound of the birds (i.e., without looking outside), you decide also to record whether the birds are chirping each day.  Over the past two weeks, you've observed the following data: 

S denotes a sunny day, R denotes a rainy day, C denotes chirping birds, and NB denotes no birds.  Note that for each day, you receive a pair of values:  one for the weather and one for whether the birds are chirping.  The underlined portion represents today.

(S, C) (S, C) (R, NB) (R, NB) (S, NB) (R, NB) (S, C) (S, C) (S, C) (S, C) (S, NB) (R, NB) (R, NB) (R, C) __

[20 pts] Draw the hidden Markov model based on this data.  (Hint:  do this by counting, examining each transition and emission, to get estimates of the probabilities.)

[20 pts] What is the probability of today being sunny, assuming that you wake up to the sound of chiping birds?  (Hint:  first compute the probability of it being sunny given that yesterday was rainy.  Then combine this information with the knowledge of chirping using Bayes rule to yield p(sunny | chirping) and p(rainy | chirping).  Ensure that the two values sum to 1 by solving for alpha, and you're finished.)

Reinforcement Learning

3.) [30 pts] (Adapted from Sutton and Barto Exercise 3.5) Imagine that you are designing a robot to run a maze. You decide to give it a reward of +1 for escaping the maze and a reward of zero at all other times. The task seems to break down naturally into episodes -- the successive runs through the maze -- so you decide to treat it as an episodic task, where the goal is to maximize expected total reward Rt = rt+1 + rt+2 + rt+3 + ... + rT, where T is the final time step of an episode. After running the learning agent for a while, you find that it is showing no improvement in escaping from the maze. Something is going wrong. Does the reward function effectively communicate the goal to the agent? If not, can you suggest another reward function that will work? If the reward function is fine, what else is going wrong?  Limit your answer to two or three sentences.