Exercising material in chapters 6-9 of the Planning and Control book. This includes stochastic control, planning under uncertainty, meta-reasoning, and learning.
Answer: since errors are along axes and are uncorrelated, we can treat the problem as 3 separate one dimensional estimation problems, and use the simplified Kalman filter. Thus:
In X axis we have an initial position 0 and stdev 10 (at t=0). Now, we predict the position for t=10, and we get a mean of 10*100=1000. The error variance is sum of error variances of initial position and position change variance. Now, assuming CONSTANT velocity, error from velocity error the time elapsed multiplied by the velocity stdev, i.e. 10 * 10 = 100. Variance of prediction on X axis is thus 10*10+100*100=10100. Now we have a measurement at this time with X=1010, and variance 10*10=100. The estimate is a weighted sum - weighted by variances, and since variance for the latter is 100 times smaller, the estimate is approximately 1009.9. The new variance is (100*10100)/(100+10100)=99, i.e. a standard deviation approx 9.95, which is smaller than both deviations, obviously.
In the Y dimension we have 2 measurements with the same variance, no motion and no motion error. Thus the estimate is a simple average, Y=(0-10)/2=-5. The resulting variance is halved = 10*10/2 = 50, and the standard deviation of the estimate is approx 7.07
In the Z dimension we also have 2 measurements with the same variance, no motion and no motion error. Thus the estimate is a simple average, Z=(100+105)/2=102.5. The resulting variance is halved = 5*5/2 = 12.5, and the standard deviation of the estimate is approx 3.53.
Note: solution above uses "first principles" of weighted average with weight inversely proportional to variance. In general, the Kalman filter equations can be used. However, in the "planning and control" book the equations have an error in the equation for the Kalman gain K.
Answer: The actual correct answer depends on the exact physical quantities. Thus, the (partial) answer below is just a guideline. First, we need to define the fuzzy sets. As is usually done for fuzzy control, the characteristic functions will be trapezoid. Starting with the angles a (measures in degrees from perpendicular). We will use the notation F(a)=... to denote the chracteristic function of fuzzy set F.
OK(a) = 1-|a|/5 if |a| < 5
0 otherwise
SLIGHTLY LEFT(a) = a/5 if 5>=a>0
1 if 10>=a>5
(15-a)/5 if 15>=a>10
0 otherwise
LEFT(a) = (a-10)/5 if 15>=a>10
1 if 20>=a>15
(30-a)/10 if 30>=a>20
0 otherwise
A LOT LEFT(a) = (a-20)/5 if 25>=a>20
1 if a>25
The fuzzy sets for RIGHT aredefined symmetrically (just change the relevant signs and direction of inequalities). Likewise, we define angular velocity sets, e.g. over s in degrees per second:
NONE(s) = 1-|s|/5 if |s| < 5
0 otherwise
LEFT(s) = (s-5)/5 if 5>=s>0
1 if 10>=s>5
(15-s)/5 if 15>=s>10
0 otherwise
FAST LEFT(s) = (s-10)/10 if 20>=s>10
1 if s>20
The fuzzy sets for motion to the right are defined symmetrically (just change the relevant signs and direction of inequalities). Note that, unlike the example in class, sum over sets does NOT have to be 1.
A partial list of control rules would be:
IF OK(a) AND NONE(s) THEN DO NOTHING
IF SLIGHTLY LEFT(a) AND NONE(s) THEN PUSH LEFT ; Note - push cart left to make
pendulum swing right
IF SLIGHTLY LEFT(a) AND FAST RIGHT(s) THEN PUSH RIGHT ; Angle correction
too fast...
IF SLIGHTLY LEFT(a) AND RIGHT(s) THEN DO NOTHING
IF SLIGHTLY LEFT(a) AND LEFT(s) THEN PUSH LEFT
IF LEFT(a) AND LEFT(s) THEN PUSH LEFT HARD
IF A LOT LEFT(a) AND NONE(s) THEN PUSH LEFT HARD
What still needs to be determined is the de-fuzzification. If the controller only has the 5 discrete actions, result should be determined by some weighted voting scheme. If some continuous force effector exists, a weighted average of the result is the better control.
Answer:
call (cost 10) bluff 0.2
my-action ------ opponent (past) action --------- win 60
\ \
\ \ no bluff 0.8 0.1
\ \-------------- chance ------ win 60
\ \ 0.9
\ fold (cost 0) \-------- lose (0)
----------------- lose (0)
The lower levels of the tree are just chance nodes from my point of view, and thus we have a simple decision: folding has utility 0. Calling has utility -10+0.2*60+0.8*(0.1*60+0.9*0) = 6.8. Clearly you should call. (Actually, we lose on the hand, as we have already put in 20 for the pot before encountering this decision problem).
Now, if we call, we gain 6.8 for this initial hand (as before, ignoring the 20 in the pot, which is invariant across the decisions), and have a decision tree as above for the second encounter, with different probabilities accourding to the outcome (bluff vs. no bluff). (ue to lack of space, I will NOT show the complete tree, just the relevant sub-trees). In the case were we detected the bluff, we get, for the 2nd game:
call (cost 10) bluff 0.4
my-action ------ opponent (past) action --------- win 60
\ \
\ \ no bluff 0.6 0.1
\ \-------------- chance ------ win 60
\ \ 0.9
\ fold (cost 0) \-------- lose (0)
----------------- lose (0)
Now, for the 2nd game the optimal action is again to call, with expected utility: -10+0.4*60+0.6*0.1*60 = 17.6. The probability of this subtree occuring is 0.2. In case of no bluff in the first game, we get for the second game:
call (cost 10) bluff 0.1
my-action ------ opponent (past) action --------- win 60
\ \
\ \ no bluff 0.9 0.1
\ \-------------- chance ------ win 60
\ \ 0.9
\ fold (cost 0) \-------- lose (0)
----------------- lose (0)
And now the expected utility for calling is -10+0.1*6+0.9*0.1*60=1.4 and it still pays to call. Expected utility for the 2nd game in case we called on the first is thus: 0.2*17.6+0.8*1.4-20=-15.36. (Unfortunately, the conditions of the question do NOT allow us to walk away from the 2nd game, as it seems we always lose... Maybe the dealer cheats...). Anyway, we still have the 6.8 from the first games, so the expected utility overall of calling in the first game is 6.8-15.35=-8.55 which is clearly better than folding...
Note that the hidden independence assumptions, as well as the model of opponent behaviour, are NOT realistic...
Answer: begin by drawing the performance profiles. Clearly they do not all have the diminishing returns property. Thus, the algorithm is not guaranteed to find the optimal schedule. However, note that here there is only ONE profile which violates the assumptions - employee A. Now, there is only ONE point where the value for this profile changes, i.e. 10 hours, so the only allocations which makes ANY sense whatsoever for A are either 0 or 10 hours. From here on, we simply reason by cases - and generate two reduced (sub)-problems, which are in turn solved by the back-to-front algorithm. In sub problem 1, we assume 0 hours allocated to A, and solve, ignoring A. In sub-problem 2, we allocate 10 hours to A just before its deadline (if possible), compress the time axis to remove this allocation, and have a new subproblem for B and C but where the deadline for B and C are now at 20-10=10 hours. Now, we just select the solution to the sub-problem that is best (taking into account utility for A, obviously: 0 in the first case, 100 in the second case).
In our example, we have for the first subproblem, optimal allocation is C for 8 hours, then B for 12 hours, result being 80 for C, and 5*10+2*2=54 for B, total 134. For the second subproblem, again we have C for 8 hours, but then B for only 2 hours, thus result is 80 for C and 5*2=10 for B, total 90 for B and C. To this we add 100 for A and thus this is the optimal schedule delivering 190. Note that without reasoning by cases, naively running the back-to-front algorithm, will return the former schedule, as the slope for A is initially 0.
Also, this reasoning by cases can be generalized, but obviously the cost is exponential in the number of profiles that violate the "diminishing returns" assumption.
Answer: The "equals" predicate cannot be implemented by a single perceptron (or perceptron layer) as the "1"s are not linearly separable from the "0"s. However, this can be done by adding a hidden unit (i.e. an extra layer with 1 perceptron) that implements, say, AND. The output unit takes the inputs, and outputs a 1 if either of the original inputs is 1 AND the hidden unit is 0 (e.g. with weight 1 on the original input, and large negative weight on the link from the hidden unit, and a threshold of 0.5).
Learning OR in this example, you will need to feed in at least the examples where the perceptron gets the answer wrong several times. For example, starting with (1,1) will result in a weight correction of 0.4 for both weights. Now (1,0) will also give the wrong answer and the first weight is modified to 0.8. Now, (1,0) will give the wrong answer and the secod weight will also become 0.8. Example (0,0) will give the correct result - 0, and no weight change. If we repeat example (1,1) both weights will change to 1.2, which will henceforth work for every example, and the learning stops.
Note that, strictly speaking, we should require that the threshold be adjustable as well - otherwise we cannot lean a function that outputs 1 on a (0,0) input - this is done by having the threshold as an extra weight on an arc with input constantly at 1. However, this was not necessary in our example. Also, another sequence of example presentation may cause learning to proceed either faster, or slower...
Answer: Suppose we begin the process by seeing the points (0,0) ans (3,0) first, designating them as cluster centers. Now, if we see (0,2) clearly it is closer to (0,0) so the cluster center is moved (say to (0,1) if our learning rate is 0.5). Now, seeing point (0,4) again it is closer to the first cluster, and the cluster center moves to (0,2.25).
However, if instead of (0,0) we have (0,4), designating (0,4) and (3,0) as initial centers, this can all change. If we now see (0,0) it is closer to (3,0) and we have a cluster center at (1.5,0). Now we see (0,2) which is closer to (0,4) and the cluster centers are now at (0,3) and at (1.5, 0).
In general, this algorithm is very sensitive to initial cluster centers and order of example presentation - unless the clusters are extremely localized.