Robotics (202-25161) - Spring 2002

Theoretical Exercise 2

Answers


Goals

Exercising material in chapters 6-9 of the Planning and Control book. This includes stochastic control, planning under uncertainty, meta-reasoning, and learning.

Questions

  1. An aircraft is moving (exactly) parallel to the x axis. At time t=0 a GPS measurement stated its (x, y, z) coordinates as (0, 0, 100), (all coordinates in meters) and at time t=10 another GPS measurement gave the coordinates (1010, -10, 105). According to the inertial navigation system, the aircraft velocity is 100 meters per second, with a standard deviation of 10 meters per second. GPS measurements have a standard deviation of 10 in x and y and a standard deviation of 5 in z. Assume that the errors along the axes are independent. What is the optimal estimate for aircraft location at t=10 and t=20, and the standard deviations for the estimate?

    Answer: since errors are along axes and are uncorrelated, we can treat the problem as 3 separate one dimensional estimation problems, and use the simplified Kalman filter. Thus:

    In X axis we have an initial position 0 and stdev 10 (at t=0). Now, we predict the position for t=10, and we get a mean of 10*100=1000. The error variance is sum of error variances of initial position and position change variance. Now, assuming CONSTANT velocity, error from velocity error the time elapsed multiplied by the velocity stdev, i.e. 10 * 10 = 100. Variance of prediction on X axis is thus 10*10+100*100=10100. Now we have a measurement at this time with X=1010, and variance 10*10=100. The estimate is a weighted sum - weighted by variances, and since variance for the latter is 100 times smaller, the estimate is approximately 1009.9. The new variance is (100*10100)/(100+10100)=99, i.e. a standard deviation approx 9.95, which is smaller than both deviations, obviously.

    In the Y dimension we have 2 measurements with the same variance, no motion and no motion error. Thus the estimate is a simple average, Y=(0-10)/2=-5. The resulting variance is halved = 10*10/2 = 50, and the standard deviation of the estimate is approx 7.07

    In the Z dimension we also have 2 measurements with the same variance, no motion and no motion error. Thus the estimate is a simple average, Z=(100+105)/2=102.5. The resulting variance is halved = 5*5/2 = 12.5, and the standard deviation of the estimate is approx 3.53.

    Note: solution above uses "first principles" of weighted average with weight inversely proportional to variance. In general, the Kalman filter equations can be used. However, in the "planning and control" book the equations have an error in the equation for the Kalman gain K.

  2. Continue to compute the optimal policy for 3 and 4 steps, for the 6-state soccer MDP problem given in class.

  3. Write fuzzy control rules for the inverted pendulum control problem, where there are 5 possible actions: do nothing, push left or push right, each of the latter can be either a "normal" push or a "hard" push. The observed state variables are pendulum angle (using fuzzy sets: perpendicular, SLIGHTLY LEFT, LEFT, A LOT LEFT, OK, SLIGHTLY RIGHT, RIGHT, A LOT RIGHT), and pendulum angular speed (NONE, LEFT, FAST LEFT, RIGHT, FAST RIGHT). Desired condition is 0 angle (i.e. fuzzy set OK). Define the relevant fuzzy sets (characteristic function) and a set of fuzzy control rules for achieving this task.

    Answer: The actual correct answer depends on the exact physical quantities. Thus, the (partial) answer below is just a guideline. First, we need to define the fuzzy sets. As is usually done for fuzzy control, the characteristic functions will be trapezoid. Starting with the angles a (measures in degrees from perpendicular). We will use the notation F(a)=... to denote the chracteristic function of fuzzy set F.

    OK(a) =   1-|a|/5      if |a| < 5
              0            otherwise
    
    
    SLIGHTLY LEFT(a)   =   a/5        if  5>=a>0
                           1          if  10>=a>5
                           (15-a)/5   if  15>=a>10
                           0          otherwise
    
    
    LEFT(a)            =   (a-10)/5   if  15>=a>10
                           1          if  20>=a>15
                           (30-a)/10  if  30>=a>20
                           0          otherwise
    
    A LOT LEFT(a)      =   (a-20)/5   if  25>=a>20
                           1          if  a>25
    

    The fuzzy sets for RIGHT aredefined symmetrically (just change the relevant signs and direction of inequalities). Likewise, we define angular velocity sets, e.g. over s in degrees per second:

    NONE(s) =  1-|s|/5      if |s| < 5
               0            otherwise
    
    
    LEFT(s)            =   (s-5)/5    if  5>=s>0
                           1          if  10>=s>5
                           (15-s)/5   if  15>=s>10
                           0          otherwise
    
    FAST LEFT(s)       =   (s-10)/10   if  20>=s>10
                           1           if  s>20
    

    The fuzzy sets for motion to the right are defined symmetrically (just change the relevant signs and direction of inequalities). Note that, unlike the example in class, sum over sets does NOT have to be 1.

    A partial list of control rules would be:

    IF OK(a) AND NONE(s) THEN DO NOTHING
    
    IF SLIGHTLY LEFT(a) AND NONE(s) THEN PUSH LEFT  ; Note - push cart left to make
                                                      pendulum swing right
    
    IF SLIGHTLY LEFT(a) AND FAST RIGHT(s) THEN PUSH RIGHT    ; Angle correction
                                                               too fast...
    
    IF SLIGHTLY LEFT(a) AND RIGHT(s) THEN DO NOTHING
    
    IF SLIGHTLY LEFT(a) AND LEFT(s) THEN PUSH LEFT
    
    IF LEFT(a) AND LEFT(s) THEN PUSH LEFT HARD
    
    IF A LOT LEFT(a) AND NONE(s) THEN PUSH LEFT HARD
    

    What still needs to be determined is the de-fuzzification. If the controller only has the 5 discrete actions, result should be determined by some weighted voting scheme. If some continuous force effector exists, a weighted average of the result is the better control.

  4. Complete computation of the optimal action-sequence fr the 3-state STL problem example given in class.

  5. Consider the folowing simplified game of poker. You and one opponent get dealt a hand. The opponent makes a bet. You now have 2 actions: fold, call (for simplicity, we shall ignore "raise"). The pot contains 50 NIS, and you need to pay another 10 NIS to call. You have a pair of Jacks. Now, assume that you judge that the probability that the opponent has bluffed is 0.2 (in which case - if you call - you win), otherwise, the opponent will have BETTER than a pair of Jacks with probability 0.9. Do you call or fold, if:

    Answer:

    Note that the hidden independence assumptions, as well as the model of opponent behaviour, are NOT realistic...

  6. You have 3 employees, which have the following (certain) performance profiles. Employee A gives no results at all initially, but after 10 hours provides a comes up with a product worth 100 NIS, and thereafter goes to sleep forever. Employee B produces products worth 5 NIS per hour, until 10 hours of work, at which point gains only 2 NIS per hour. Employee C provides results worth 10 NIS per hour, but goes to sleep forever after 8 hours. You have the following deadlines: for employee A's product - 15 hours, for employee B and C products - 20 hours. You can only employ one person at any given time. What is the optimal employment schedule? Can you use the back-to-front algorithm? How can you produce the optimum, given that the only non-concave profiles are step-functions, and that their number is bounded (and small)?

    Answer: begin by drawing the performance profiles. Clearly they do not all have the diminishing returns property. Thus, the algorithm is not guaranteed to find the optimal schedule. However, note that here there is only ONE profile which violates the assumptions - employee A. Now, there is only ONE point where the value for this profile changes, i.e. 10 hours, so the only allocations which makes ANY sense whatsoever for A are either 0 or 10 hours. From here on, we simply reason by cases - and generate two reduced (sub)-problems, which are in turn solved by the back-to-front algorithm. In sub problem 1, we assume 0 hours allocated to A, and solve, ignoring A. In sub-problem 2, we allocate 10 hours to A just before its deadline (if possible), compress the time axis to remove this allocation, and have a new subproblem for B and C but where the deadline for B and C are now at 20-10=10 hours. Now, we just select the solution to the sub-problem that is best (taking into account utility for A, obviously: 0 in the first case, 100 in the second case).

    In our example, we have for the first subproblem, optimal allocation is C for 8 hours, then B for 12 hours, result being 80 for C, and 5*10+2*2=54 for B, total 134. For the second subproblem, again we have C for 8 hours, but then B for only 2 hours, thus result is 80 for C and 5*2=10 for B, total 90 for B and C. To this we add 100 for A and thus this is the optimal schedule delivering 190. Note that without reasoning by cases, naively running the back-to-front algorithm, will return the former schedule, as the slope for A is initially 0.

    Also, this reasoning by cases can be generalized, but obviously the cost is exponential in the number of profiles that violate the "diminishing returns" assumption.

  7. You need to learn the function OR in a perceptron. Start with weights all zero and a threshold of 1 (assume it cannot be changed), and show how the delta rule is used to train this network, with a learning rate 0f 0.4. Now, suppose you need to learn the "equals" predicate (result is 1 only if the inputs are equal). Can this be done with a single perceptron? Can this be done with more than 1 layer of perceptrons?

    Answer: The "equals" predicate cannot be implemented by a single perceptron (or perceptron layer) as the "1"s are not linearly separable from the "0"s. However, this can be done by adding a hidden unit (i.e. an extra layer with 1 perceptron) that implements, say, AND. The output unit takes the inputs, and outputs a 1 if either of the original inputs is 1 AND the hidden unit is 0 (e.g. with weight 1 on the original input, and large negative weight on the link from the hidden unit, and a threshold of 0.5).

    Learning OR in this example, you will need to feed in at least the examples where the perceptron gets the answer wrong several times. For example, starting with (1,1) will result in a weight correction of 0.4 for both weights. Now (1,0) will also give the wrong answer and the first weight is modified to 0.8. Now, (1,0) will give the wrong answer and the secod weight will also become 0.8. Example (0,0) will give the correct result - 0, and no weight change. If we repeat example (1,1) both weights will change to 1.2, which will henceforth work for every example, and the learning stops.

    Note that, strictly speaking, we should require that the threshold be adjustable as well - otherwise we cannot lean a function that outputs 1 on a (0,0) input - this is done by having the threshold as an extra weight on an arc with input constantly at 1. However, this was not necessary in our example. Also, another sequence of example presentation may cause learning to proceed either faster, or slower...

  8. Use K-means clustering (manually) to cluster the following set of points in 2 dimensions: (0, 0), (0, 2), (0, 4), (3, 0). Show how an initial choice for the cluster center seeds and order of points considered in the algorithm changes the obtained results.

    Answer: Suppose we begin the process by seeing the points (0,0) ans (3,0) first, designating them as cluster centers. Now, if we see (0,2) clearly it is closer to (0,0) so the cluster center is moved (say to (0,1) if our learning rate is 0.5). Now, seeing point (0,4) again it is closer to the first cluster, and the cluster center moves to (0,2.25).

    However, if instead of (0,0) we have (0,4), designating (0,4) and (3,0) as initial centers, this can all change. If we now see (0,0) it is closer to (3,0) and we have a cluster center at (1.5,0). Now we see (0,2) which is closer to (0,4) and the cluster centers are now at (0,3) and at (1.5, 0).

    In general, this algorithm is very sensitive to initial cluster centers and order of example presentation - unless the clusters are extremely localized.