December 10, Tuesday
12:00 – 14:00
Performance analysis of background read tasks
Computer Science seminar
Lecturer : Eitan Bachmat
Location : -101/58
Abstract: A very common task in modern storage systems is to read an entire large file or a disk from beginning to end in the background while serving customer I/O requests in the foreground. Such procedures are performed during backup and restore operations, when creating temporary copies of databases or during data migrations to name a few applications. There are two different ways in which the performance of such procedures can be measured. The first concerns the negative effect of the background operation on the response time of the foreground requests. The second is the elapsed time of the background operation. Common sense indicates that improving the performance according to one measure will probably worsen the performance according to the second. Such common sense has been formalized in the famous "No free lunch theorem" of Kleinrock. Two schemes (algorithms) for managing the background operation have been studied in the past,
the first by Thomasian and Menon (the TM process) and the second by Merchant and Yu (the MY process). The approximate analysis of their respective schemes via queuing theory is rather difficult. It is also difficult to analytically compare the two methods. In this talk we will first describe all the choices one has to make in
order to manage a background read process, surprisingly this has not been done before. We then describe a trivial modification to both the TM and MY schemes.
The modified schemes have (slightly) better performance then the original schemes in both performance measures. The analysis of performance is much simpler and it becomes easy to compare their respective performance. It turns out that the modified TM is better then the modified MY in terns of foreground response time and has the same elapsed time contradicting the common sense indications maintained above. We will take this opportunity to mention other cases, most notably from the work
of Harchol-Balter and Bansal, in which the conclusions of the "No free lunch theorem" do not hold (obviously the assumptions do not hold as well).
The simplified analysis of the modified systems relies on a certain claim that the modified process requires the disk to seek very little. We will state the claim and provide an elementary proof of it (look mom, no generating functions). As it turns out we will actually be computing the size of the largest tree of a random forest, we will explain the seemingly mysterious connection via the theory of linear probing hashing, probably the oldest and most analyzed algorithm in computer science.