Due: 
        April 3, 2012 by the start of class
      
    
Your responses to these questions must be submitted in hardcopy;
      it is alright to write your responses by hand.  For this
      assignment, you may check your answers with another student and
      work through the problems together only AFTER you have made a serious attempt by
      yourself.  Make absolutely certain that you can do these
      types of problems by yourself.
    
Be certain to include a statement of sources at the top of your
      assignment, listing all sources you consulted (websites, fellow
      students, etc.) while completing the assignment.  You do not
      need to list any course materials (textbooks, lecture notes, or
      the professor).
      
    
1.) [30 pts] Returning to the weather example from class, assume
      that you have the following Markov Model:
    
What is the probability that day 3 (i=2) will be rainy?  Be
      certain to start with day 0 and show your work unless you are
      absolutely certain.
    
      2.) [40 pts total]  After a few weeks of using your model to
      predict the weather and finding that it wasn't very incorrect, you
      decided to learn a model based on observations.  So that you
      will also be able to infer the weather from the comfort of your
      bed from the sound of the birds (i.e., without looking outside),
      you decide also to record whether the birds are chirping each
      day.  Over the past two weeks, you've observed the following
      data:  
    
S denotes a sunny day, R denotes a
      rainy day, C denotes chirping birds, and NB denotes no
      birds.  Note that for each day, you receive a pair of
      values:  one for the weather and one for whether the birds
      are chirping.  The underlined portion represents today.
    
[20 pts] Draw the hidden Markov model based on this data. (Hint: do this by counting, examining each transition and emission, to get estimates of the probabilities.)
3.)
      
      [30
        pts] (Adapted from Sutton and Barto Exercise 3.5) Imagine that
        you are designing a robot to run a maze. You decide to give it a
        reward of +1 for escaping the maze and a reward of zero at all
        other times. The task seems to break down naturally into
        episodes -- the successive runs through the maze -- so you
        decide to treat it as an episodic task, where the goal is to
        maximize expected total reward Rt = rt+1 + rt+2 + rt+3 + ... + rT,
        where T is the final time step of an episode. After running the
        learning agent for a while, you find that it is showing no
        improvement in escaping from the maze. Something is going wrong.
        Does the reward function effectively communicate the goal to the
        agent? If not, can you suggest another reward function that will
        work? If the reward function is fine, what else is going
        wrong?  Limit your answer to two or three sentences.