Dr. Eitan Bachmat

Lecturer

Address:
 
 
 
Department of Computer Science
Ben-Gurion University
P.O. Box 653
Beer-Sheva 84105, Israel
Office: Building 58, Room 310
Email: ebachmat@cs.bgu.ac.il
Phone: +972 (0) 8 647 2711
Fax: +972 (0) 8 647 7650
Office hours: Monday 10–12
   me


Research

I am the world's worst storage systems researcher. This is not surprising given the fact that I don't know anything about operating systems and file systems in particular. Instead of doing experiments I perform thought experiments. I can't program. I like working with models from the 60's even though they are regarded by nearly the entire community as being completely useless. I myself admit that they are completely inaccurate.. I also like to consider performance related problems which I know in advance to have no application. I am head of the Serial Data lab (SDL) at BGU. I am the only member of the lab. Given this situation and the shortcomings presented above we employ thought experiments instead of real experiments. Thats why I work with serial workloads in which only a single I/O is sent at a given time, preferably with ample time between them, I just cant multitask.

I have come to be a systems researcher because I was a terrible mathematician. To summarize my relations with mathematics, I love mathematics, it does not love me back. Given this situation I had to leave this relationship at some point. As revenge, I am exploiting mathematics in my new role as a systems researcher. This has not added to my popularity in the systems world. Here are some of my research projects:

It's not a bird, it's not a plain plane, it's a Minkowski airplane

We are trying to figure out how to model one of the world's most yawn provoking activities, namely, airplane boarding. It turns out that airplane boarding can be analyzed via Lorentzian (a.k.a spacetime) geometry and in many cases of interest by regions in the Minkowski plane. Unfortunately, that is for an infinite number of passengers, but do not dispair, random matrix theory (RMT) sometimes comes to the rescue. RMT is useful only when we deal with very thin passengers who can walk quickly one after the other along a narrow aisle. I am not trying to be judgemental, but you would expect more from a theory like RMT. In my opinion the fact that RMT is only willing to model certain types of passengers is discriminatory to say the least. Perhaps an academic boycott against random matrix theory and everything it stands for is the proper way of handling this.

Ocean dip

In this project we try to understand the recently proposed, fractal and multifractal models of user access patterns such as the PQRS model of Wang, Ailamaki and Faloustos. We have developed formulas for meaningful quantities such as average seek distance and hit rates. We have also analyzed what happens to such models when they pass through a cache and the behavior of extremal examples. We still want to develop more formulas because it is fun and to study the relations between hit ratios and Host's theorem from ergodic theory. The study of caching suggests some quantitative versions of Host's theorem on binary vs. ternary measures on the circle. I also happen to think that the models are useful.

Blue sky blues

BGU has (by far) the world's largest academic collection of system wide traces of large disk arrays from production environments (hundreds of traces). We also have (by far) the world's largest academic collection of counter data from similar machines (in the hundreds as well).

BGU has no traces at all. I will try to explain how both these (ontological?) statements are currently true.

For a very long time we have done nothing with the vast trace collections that we have because they don't suit our approach to performance analysis and modeling of storage systems, an approach which is based on thought experiments. To our great surprise we found out (the hard way) that colleagues are reluctant to allow us the publication of papers which are based on thought experiments in conferences and journals. Consequently we have decided to use our vast trace collection for the purpose of accumulating papers, the raison d'etre of academicians. We have devised the following devilish plan of action. We will publish an analysis of data which for some (unknown to us) reason seems to interest people, things like, sequentiality/randomness ratios, read/write ratios, hit ratios, burstiness, spatial-temporal correlations etc. We had a student who did this kind of work. After doing a lot of (very nice) work he decided he doesnt really want to finish his thesis and abandoned his mission. After a few more months we have lost email contact with the student. Only the former student knows where the traces are. So I would like to make the following plea,

Dear former student, please contact me, I am concerned about your well-being, the state of your thesis and the ontological status of the traces.

Humpty dumpty

This is our advanced cache algorithms project in which we attempt to persuade the designers of the world's largest and most complex storage systems that John Doe from the local grocery store knows much more about caching than their 1M USD machines do. personal observations have suggested that people who run grocery stores know what their customers like to buy in the morning, and that moreover, these are not necessarily the same objects which were desired last night, even though last night is temporally closer than yesterday morning. As far as we know this and related observations which are just as simple are not utilized in sophisticated commercial storage systems. We would like to change that or at least write a paper about this stuff.

Green eggs and Kosher food

In this project we pick papers by other people which we liked and try to do the math for them. Most systems people tend not to display their mathematical prowess. Their shyful ways have led to many beautiful papers without equations. We try to add the mathematical analysis which sometimes even yields some new insights.

Junk yard

In this project we take some really old and officially useless models for user access patterns from the 60's and 70's and try to understand them better. We then use them to manage, commercially available, big distributed storage systems (see my patents). Our favorites are the independent reference model and it's generalization, the partial Markov model. We also work with the most non parallel I/O access patterns, namely one I/O at a time at a very low rate. Why do we do that? because it is interesting, highly parallel bandwidth driven applications being well understood, this is the last frontier. We are now looking at renewal processes but that seems to be much harder than the IRM. Our efforts in this direction have been commercially very successful.

The butler and the maid

We are working on QoS in storage systems. We are trying to understand the relations between bandwidth QoS and response time QoS. The first is important for parallel applications, the second for sequential applications.

Things hidden since the dawn of time

We are constantly thinking about automated management of storage environments. Some of this work from the past has appeared in the form of commercial products and patents, other stuff has not been written yet.


Some Papers

I also have 29 U.S. patents along with patents and applications in Europe and Japan.


Teaching


Graduate Students


Other Activities


Personal