Contents (hide)
 1 Java synchronization
   1.1 Visibility and Re-ordering
  1.1 The "synchronized" Construct in Java
  1.2 How "synchronized" is implemented
   1.2.1 When to Use Synchronization
   1.2.2 Synchronized Properties
   1.2.3 Locks are Reentrant
   1.2.4 Locking a specific object
 2 Design Patterns to Enforce Safety
   2.1 Composition / Containment
   2.2 Client Side Locking
   2.3 Version Iterators
 3 More Synchronization Primitives

Synchronization in Java and Safety-related design patterns

In the last lecture we discussed immutability as a way to guarantee the safety and correctness of our classes. During this lecture, we continue with synchronization mechanisms modern RTEs offer us to cope with concurrency issues. In that scope, we describe places where synchronization is needed. We then discuss structural ways (in the form of design patterns) used to ensure exclusive access.

We will cover the following topics:

  • The synchronization mechanism in the Java RTE
  • Visibility and re-ordering
  • Design patterns:
    1. The monitor pattern.
    2. Client side locking.
    3. Class composition.
    4. Version iterators.

Java synchronization

The most basic mechanism Java offers the programmer to facilitate the safety of concurrent objects is called synchronization. Think of two threads which try to access the same object concurrently. To preserve the safety of the object, those two (or more) threads need to synchronize their access to the internal state of the object. For example, consider the (mutable) class Even from the previous lecture:

  1. class Even {
  2.  
  3.    /* the internal state counter 
  4.     * @INV counter_ % 2 = 0
  5.     */
  6.    private int counter_ = 0;
  7.  
  8.    /* default constructor */
  9.    public Even() { } 
  10.  
  11.    /* increment the counter.
  12.     * return the updated counter value 
  13.     * @PRE counter % 2_ = 0 (this is redundant, as it is promised by the @INV)
  14.     * @POST @POST(counter_) = @PRE(counter_)+2
  15.     */
  16.    public int add() {
  17.       ++counter_;
  18.       ++counter_;
  19.       return counter_;
  20.    }
  21. }

If two threads can manipulate the internal state of such an object concurrently, the safety of the object is compromised, as we saw in the previous lecture. This means that the state of the object can end up not satisfying the @invariant of the class when run in a concurrent environment, even though the code would be perfectly safe when run in a sequential environment.

Java offers a mechanism with which we can coordinate the actions both of these threads perform on the object. For example, if we could express in Java that only one thread is allowed to enter the add() function at a time, we can then ensure that the safety of the object will be preserved.

Visibility and Re-ordering

Other unsafe constructs may arise in concurrent execution environments. One is related to visibility and the other to re-ordering.

Visibility

Visibility determines when threads can see values (of all types, even references to objects) written to memory by other threads. Consider the following code:

Anonymous Class
An anonymous class is a mechanism in Java that lets you create a class that has no name. It combines the class declaration and the creation of an instance of the class in one step. This allows "on the fly" creation of the object. More
  1. class VisibilityDemo {
  2.         private int i_=0;
  3.  
  4.         public VisibilityDemo() { }
  5.  
  6.         public void set(int i) {
  7.                 i_ = i;
  8.         }
  9.  
  10.         public int get() {
  11.                 return i_;
  12.         }
  13.  
  14.         public static void main(String [] args) {
  15.                 final VisibilityDemo d = new VisibilityDemo();
  16.                 // Create a new thread by giving it a new Runnable object.
  17.                 // The Runnable object is declared here, inline.
  18.                 // Such a declaration is called an "anonymous class".
  19.                 // the Visibility problem would occur even if we had used a regular class
  20.                 Thread t = new Thread(new Runnable() {
  21.                         public void run(){
  22.                                 System.out.println(d.get());
  23.                         }});
  24.                 d.set(10);
  25.                 t.start();
  26.         }
  27. }

A legitimate output for this code is 0. That is, the thread t does not "see" the change to the state of d (set to 10) performed in the main thread of the program.

To understand why, first note that in this example there are two active threads; one is the main thread, which is always implicitly created for us by the JVM. The main thread executes the Main() method. The second thread is the one we created (referenced in variable t). Now, when the main thread calls d.set(10), the value 10 is written to the memory which holds the i_ member of the VisibilityDemo object. However, when the second thread reads from the same memory location (by using the d.get method), Java does not guarantee that this second thread will see the most recent value written to this memory location.

Re-ordering

Reordering is what happens when the compiler and the RTE change our code behind our back, to better optimize it. For example, the compiler may decide to convert the following code:

  1. int i=10;
  2. int b=11;
  3.  
  4. i++;
  5. b++;

into this code:

  1. int i=10;
  2. int b=11;
  3. ....
  4. ....
  5. b++;
  6. i++;

The compiler must be able to re-order simple instructions to optimize our code. It would do that only if there is no "must-happen-before" constraint on the code being reordered.

If there was a "must-happen-before" constrain in our example then the compiler would not reorder it. For example:

  1. int i=10;
  2. int b=11;
  3. ....
  4. ....
  5. i++;
  6. Print (i)
  7. b++;
Inline Expansion
inline expansion, or inlining, is a compiler optimization that replaces a function call site with the body of the callee. More
In fact "must-happen-before" constrained would happen just because we call any other (non-inline) function in the middle since the compiler would not go recursively into that function to see what it is doing:

  1. int i=10;
  2. int b=11;
  3. ....
  4. ....
  5. i++;
  6. NonTrivialFunc();
  7. b++;

The compiler only guarantees safe reordering in a non-concurrent execution environments. However, in concurrent environments, reordering may harm us.

The "synchronized" Construct in Java

To express in Java that we want to allow only one thread at a time to enter a specific method of an object, we add a special keyword to the method's declaration.

Consider the following code:

  1. /* notice the synchronized keyword */
  2. public synchronized int add(){
  3.       ++counter_;
  4.       ++counter_;
  5.       return counter_;
  6.    }

Notice the synchronized keyword added to the add() function. Once we mark a method as synchronized, the Java RTE ensures that, during runtime, only one thread at a time may access this method. Moreover, synchronization also solves the visibility and reordering problems; a "write" to a memory location which takes place during a synchronized code section is guaranteed to be returned to all "read" operations following it, if they are protected by the same lock (we have not yet defined what a lock is, so bear with us).

Next, we will elaborate on the mechanism Java employs, internally, to enable such synchronization. Understanding this mechanism will help us use it to better effect and avoid common pitfalls.

How "synchronized" is implemented

Each object inside the JVM (at runtime) has an associated lock (this lock is sometimes called the "monitor" of the object). The JVM is in charge of maintaining and initializing these locks. The JVM attaches such a lock to every object in the system, such that each object has a lock all of its own. We, as programmers, cannot access these locks explicitly (they are not standard members of any class), rather we can employ them implicitly by using the synchronized keyword.

Locks may be in one of two states; either in the possession of a thread or available. A thread, T, may try to acquire the lock of any object. If the lock is available, the lock is immediately transfered into the possession of T. Otherwise, T goes to sleep until the lock is available (this happens when the thread that currently possesses the lock releases it). Then T wakes up and tries to acquire the lock again (T may fail again, as another thread may have been quicker and grabbed the lock before T woke up).

When we declared the add() method as synchronized, the compiler added something like the following pseudo code to the method:

  1. public synchronized int add(){
  2.   before_entering_this_method_grab_the_lock_of_this();
  3.       ++counter_;
  4.       ++counter_;
  5.       return counter_;
  6.   after_leaving_this_method_release_the_lock_of_this();
  7. }

As can be seen in the code snippet above, each thread, when trying to call the method, must first try to acquire the lock. Moreover, when the thread exits this function, the thread will release the lock.

The cost of this implementation

Using synchronized comes with a price. To ensure correct memory semantics, after a thread exits a synchronized code block all the memory of all the threads needs to be synchronized with the main memory. Since each thread may cache copies of the main memory in its own memory (e.g., in CPU registers or on a CPU cache closer to the thread), each thread's memory needs to be synchronized with the main memory. This is a costly operation.

The other cost, is that as long as a thread is active in the body of a synchronized method for a certain passive object, all other threads that attempt to enter the same object for any synchronized method will have to wait. This means that threads must wait for each other.

When to Use Synchronization

The following rules of thumb should be observed:
  • In each object, each mutable variable which is accessible from different threads must be protected with a lock (else, you may be hit by re-ordering problems).
  • The monitor pattern - The internal state of the object should be encapsulated and objects should be responsible to synchronize the access to the state, by using synchronized get() and set() methods.
  • When an object has an invariant which involves more than one member, all accesses to these members, throughout the code of the object, must be synchronized using the same lock.
  • Never lock large parts of code. Try to minimize the sections of code which use locking. We will elaborate on this in following lectures.

Synchronized Properties

There are some language rules to which we must adhere when using synchronization and some miscellaneous issue which must be observed.

  • The synchronized keyword is not part of the method signature.
  • The synchronized keyword cannot be specified in an interface.
  • Constructors may not be declared as synchronized (There would be no point in declaring a constructor synchronized, since as long as the constructor is not complete, the object instance is not yet accessible by any other thread than the one constructing it.)
  • When we declare a static method as synchronized, we are actually telling the JVM that, in contrast to a regular method where we ask that the lock associated with the object be used, we want the lock associated with the class to be used. This is possible since each class in Java is also an object on its own.
  • Java locks are Reentrant: The same thread holding the lock can get it again and again.

Locks are Reentrant

Consider the following tweak to class Even:

  1. public synchronized void doSomething(){
  2.     //do something.
  3. }
  4.  
  5. public synchronized int add(){
  6.       doSomething();
  7.       ++counter_;
  8.       ++counter_;
  9.       return counter_;
  10.    }

This code will work as expected; when a thread, T, successfully acquired the lock of an Even object when T entered the add() method, T will be able to enter the doSomething() method since T already possesses the lock of the same object. Such locks are called recursive or reentrant.

Reentrant locks are implemented using a counter. The same thread may acquire the lock multiple time, consecutively, without releasing the lock. Each time the lock is acquired, an internal counter is incremented. The lock is unavailable whenever the counter has a value larger than 1. Whenever a thread releases the lock, the counter value is decremented by 1. When the counter reaches 0, the lock is then available for other threads.

Locking a specific object

Unlike adding the synchronized keyword to a method declaration, which implicitly uses the lock/monitor of the current object, we can explicitly request to lock a specific object we wish to acquire. For example, class Even may be implemented in the following way:
  1. class Even{
  2.  
  3.    /* the internal state counter 
  4.     * @INV counter_%2=0
  5.     */
  6.    private int counter_ = 0;
  7.  
  8.    /* an object which we use as a lock */
  9.    private Object i_am_a_lock_ = new Object();
  10.  
  11.    /* default constructor */
  12.    public Even(){ } 
  13.  
  14.    /* increment the counter.
  15.     * return the current counter value 
  16.     * @PRE counter%2_=0 (this is redundant, as it is promised by the @INV)
  17.     * @POST @POST(counter_) = @PRE(counter_)+2
  18.     */
  19.    public int add(){
  20.       synchronized(i_am_a_lock_){
  21.          ++counter_;
  22.          ++counter_;
  23.          return counter_;
  24.       }
  25.    }
  26. }

However, in general, it is not advised to use synchronized blocks. It is usually better to define synchronized methods, as small as they may be. Such practice leads to better understanding of your code.

Design Patterns to Enforce Safety

We have seen that synchronization is a powerful tool which the programmer has in her arsenal. But, will synchronization solve all of our problems? probably not…

Consider the following problem: while implementing a new class to automatically solve tests in calculus, you employed a Vector to hold some information. At some point, you wanted to add an element to the vector, but only if the element is missing. Being a true programmer as you are, you read the javadoc of the Vector class and noticed that this class is thread-safe. However, there is no such method as addIfAbsent().

Will the following snippet be thread-safe?

  1. if (!myVector_.contains(someElement))
  2.     myVector_.add(someElement);

Probably not (life is just not fair).

The reason is that this can happen:

  1. if (!myVector_.contains(someElement))
  2.     // AFTER this thread tested that the vector does not contain someElement
  3.     // and BEFORE it adds it, ANOTHER thread jumps in and adds the same element.
  4.     myVector_.add(someElement);

The following is also not a correct solution:

  1. /* an improved Vector class, holding elements of type E*/
  2. public class improvedVector<Eextends Vector<E{
  3.    
  4.    /* add if absent implementation */
  5.    public synchronized void addIfAbsent(E e){
  6.       if (!contains(e))
  7.          add(e);
  8.    }
  9. }

The solution presented above is incorrect. The reason is that the lock protecting the addIfAbsent() method may be different from the lock protecting the add(). You are also not sure that add() is synchronized altogether. Moreover, what will happen when the implementation of Vector will change? Maybe in a newer Java release Vector will not be thread safe? What will happen to your code then? This can lead to all kind of hells. Trust me, you do not want to do this!

There exist two design patterns which can help us solve this issue. Each one is relevant in slightly different circumstances: Composition and Client Side Locking.

Composition / Containment

Composition (also named Containment) is about "almost" reimplementing the functionality of Vector. Why should we re-implement something which works? Why not extend the Vector class and add the function we want? The answer is that we do not know how the class Vector protects itself from concurrent access. Specifically, we do not know, and may not have access to, which objects are used by the Vector class to synchronize access to itself. Moreover, it is a good design strategy to have a basic (non thread safe) object and a thread safe version of it.

So, we will write a wrapper around the Vector class, which enforces the right synchronization policy and delegates all the calls (after synchronization is verified) to the underlying implementation class.

  1. /* an improved Vector class, holding elements of type E*/
  2. public class improvedVector<E{
  3.    
  4.    /* The internal Vector class we use. 
  5.       Notice that we declare it final. 
  6.    */
  7.    private final Vector<Evec_;
  8.  
  9.    /* default, empty, constructor */
  10.    public improvedVector() {
  11.       vec_ = new Vector<E>();
  12.    }
  13.    
  14.    /* clear the vector */
  15.    public synchronized void clear() {
  16.        vec_.clear();
  17.    }
  18.  
  19.    /* add an element e of type E to the vector */
  20.    public synchronized void add(E e){
  21.       vec_.add(e);
  22.    }
  23.  
  24.    /* add if absent implementation */
  25.    public synchronized void addIfAbsent(E e){
  26.       if (!vec_.contains(e))
  27.          vec_.add(e);
  28.    }
  29.    
  30.    ... 
  31.    ... 
  32.    ...
  33.  
  34. }

This technique ensures that we know exactly which lock is used to synchronize access to our vector. It is best applied in situations where we are not sure / do not know the internal implementation of an object, or when the implementation of the object may change without our knowledge.

Client Side Locking

Sometimes we may know the identity of the object which is used for synchronization. For example, we may know that the Vector class uses its own monitor (lock) to achieve synchronization. We can then use the following implementation:

  1. /* add if absent implementation */
  2. public void addIfAbsent(E e){
  3.    synchronized(vec_){
  4.       if (!vec_.contains(e))
  5.          vec_.add(e);
  6.    }
  7. }

Client side locking may be used only when we know the internal implementation of the object we are using. In our example, we know that the class Vector protects itself by using its own monitor, such that we can use the same monitor (the Vector object itself) to perform thread-safe operation. Client side locking is not appropriate when we either do not know the internal implementation or the implementation may change in the future.

In general, client-side locking is NOT a good idea: it gives the responsibility of synchronization to the clients instead of centralizing it in the object we want to access. The problem is then: how can we insure consistency among ALL the clients that access this object? If one of the client does not synchronize correctly, then ALL the code will behave incorrectly. It is always better to enforce safety on the provider side, not on the client side.

Version Iterators

Before we start discussing version iterators, let us recall how iterators and generics work since java 1.5:

Iterators and Generics

Consider this simple example:
  1. List myIntList = new LinkedList();
  2. myIntList.add(new Integer(0))
  3. Integer x = (Integer) myIntList.iterator().next();

The cast on line 3 is slightly annoying. Typically, the programmer knows what kind of data has been placed into a particular list. However, the cast is essential. The compiler can only guarantee that an Object will be returned by the iterator. To ensure the assignment to a variable of type Integer is type safe, the cast is required. Of course, the cast not only introduces clutter. It also introduces the possibility of a run time error, since the programmer might be mistaken.

What if programmers could actually express their intent, and mark a list as being restricted to contain a particular data type? This is the core idea behind generics. Here is a version of the program fragment given above using generics:

  1. List<IntegermyIntList = new LinkedList<Integer>();
  2. myIntList.add(new Integer(0));
  3. Integer x = myIntList.iterator().next();

Defining Simple Generics

Here is a small excerpt from the definitions of the interfaces List and Iterator in package java.util:
  1. public interface Iterable<T{
  2.  
  3.    // Returns an iterator over a set of elements of type T.
  4.    Iterator<Titerator()
  5. }
  6.  
  7. public interface List<EImplement Iterable{ 
  8.   void add(E x);
  9.   Iterator<Eiterator();
  10. }
  11.  
  12. public interface Iterator<E{ 
  13.   E next();
  14.   boolean hasNext();
  15. }

This should all be familiar, except for the stuff in angle brackets. Those are the declarations of the formal type parameters of the interfaces List and Iterator. Type parameters can be used throughout the generic declaration, pretty much where you would use ordinary types. When used with an actual type like the Integer in the example List all occurrences of the formal type parameter (E in this case) are replaced by the actual type argument (in this case, Integer). You might think List in our example is rewritten once introduced with the actual type but this is NOT true. There aren’t multiple copies of the code: not in source, not in binary, not on disk and not in memory.

Version iterators

Consider the following problem: we have an instance of a Vector of Integers, and we want to process them one at a time (i.e., print them to screen). The following code will work, but only if we are in a single threaded environment:

  1. Vector<Integervec_ = new Vector<Integer>();
  2. ...
  3. ...
  4. ...                         |   
  5. /* this can be used with    |
  6.    any collection which     |
  7.    implements the Iterable  |   /* this is equivalent: */
  8.    interface */             |   
  9. for (Integer i : vec_){     |   for (Iterator i=vec_.iterator()i.hasNext()){
  10.    System.out.println(i);   |      System.out.println(i.next());
  11. }                           |   }

Now, consider the case the vec_ object may be shared by several threads, each one adding or removing elements concurrently. What will happen when the element pointed to by the iterator is deleted from the vector? or a new element is added at the beginning of the vector?

A simple solution is to synchronize the entire iteration on the vector. However, there are two main problems using such a technique; (a) the body of the loop may be arbitrarily long, and locking the entire vector for the entire duration of the loop may be unacceptable and (b) we may not have access to the object used to synchronize the vector (as was the case when we used the composition design pattern).

The first and easiest solution is to first create a temporary copy of the vector, and then iterate over this copy, which is local to the current thread. However, there are still several problems with this; (a) it may be inappropriate to copy the entire data structure, as it may incur a large computation overhead and (b) we still need to lock the vector during the copy operation, and if the vector class does not offer a thread-safe cloning method, we have the same problem as above.

The Vector class does offer us a solution; Vector iterators are fail-fast, meaning that if an element is added or removed from the Vector after the iterator was created, in any way except through the Iterator's own remove or add methods, a ConcurrentModificationException will be thrown.

This can be very useful in cases there are many concurrent readers but only a few writers. Unfortunately the fail fast implementation of Vector in java is on best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators in Java should be used only to detect bugs.

Let us consider a possible implementation of fail-fast iterators, which is called version iterators.

  1. import java.util.*;
  2.  
  3. public class IterableVector<Eimplements Iterable{
  4.     protected long _version = 0
  5.     private final Vector<E_vec;
  6.  
  7.     public IterableVector(int cap) { _vec = new Vector<E>(cap)}
  8.     public synchronized int getVersion() { 
  9.         return _version;
  10.     }  
  11.     public synchronized E removeLast(){
  12.             ++_version;      // advertise update 
  13.             return _vec.remove(_vec.size()-1);
  14.     }
  15.     public synchronized void add(E x) { 
  16.         ++_version;             // advertise update 
  17.         _vec.add(x);
  18.     }
  19.     public synchronized E get(int i){
  20.         return _vec.get(i);
  21.     }
  22.     public synchronized int size(){
  23.         return _vec.size();
  24.     }
  25.     public synchronized Iterator iterator() { 
  26.         return new EAIterator<E>(this);
  27.     }
  28. }
  29.  
  30. class EAIterator<Eimplements Iterator
  31. {
  32.     protected int _currentIndex;
  33.     protected final int _currentVersion;
  34.     protected final IterableVector<E_data;
  35.  
  36.     EAIterator(IterableVector<Earr)  
  37.     {
  38.         _currentIndex = 0;
  39.         _data = arr;
  40.         _currentVersion = _data.getVersion();
  41.     }      
  42.  
  43.     public E next() { 
  44.         synchronized(_data) {
  45.             if (_currentVersion !=  _data.getVersion()) 
  46.                 throw new ConcurrentModificationException();
  47.             else if (_currentIndex == _data.size())  
  48.                 throw new NoSuchElementException()
  49.             else 
  50.                 return _data.get(_currentIndex++)
  51.         }
  52.     }
  53.     public boolean hasNext() { 
  54.         synchronized(_data) { 
  55.             if (_currentVersion !=  _data.getVersion()) 
  56.                 throw new concurrentModificationException();    
  57.             return (_currentIndex < _data.size());
  58.         }
  59.     }
  60. }

More Synchronization Primitives

The Java concurrent package introduces high-level concurrency primitives.

The Lock interface in Java introduces a key functionality: the possibility to test that a lock can be acquired without blocking. Lock objects work like the implicit locks used when accessing a synchronized code segment. Only one thread can own a Lock object at a time. The biggest advantage of Lock objects over implicit locks is their ability to back out of an attempt to acquire a lock: The tryLock() method returns if the lock is not available immediately or before a timeout expires (if specified).

  • Semaphore
  • Exchanger
  • Latch