It´s just about caching, honey!

Sometimes we want to read data once from a store and cache it in memory because retrieving it from the source is rather expensive. There are two strategies to achieve that:

The first way of caching is achieved by loading the entire data of a given type from some source at once and access it from the cache. This is suitable for small caches or if most cache entries are likely to be read.

An alternative is a growing cache. Whenever data is requested from the cache the first time the cache will load it from the given source, cache it and finally, return it to the client. If data is requested from the second time on it will be returned from the cache without accessing the source.

This way, not the entire data of a given type but just the subset which has been requested at least once is cached. This reduces the memory consumption of the cache.

In this blog post we will develop a simple, lightweight growing cache.

One class to rule them all

The simplest approach to write a cache is to use JAVA´s built-in Map implementations:

K key = ...;
V value;

Map<K, V> cache = new HashMap<>();

...

if (cache.containsKey(key)) {
	value = cache.get(key);
}
else {
	value = createValue(...);
	cache.put(key, value);
}

First, we check if there is an entry in our cache for the key. If this is true we will read the value assigned to that key.

Otherwise, we will create the value and add it to the cache for the key.

Each time we need a cache we will write that code. Only the code for creating or reading a value from some source will differ. If our cache becomes more sophisticated we will have even more duplicate boiler plate code spread throughout our code. That is not a very DRY​1​ approach.

The better approach is to locate code in one class which can be used anywhere where we need a cache:

public class Cache<K, V> {

    private final Map<K, V> internalCache = new HashMap<>();
    
    public V get(K key, Supplier<V> valueSupplier) {
        Objects.requireNonNull(valueSupplier, "Parameter valueSupplier should not be null.");
        
        V result;
        
        if (this.internalCache.containsKey(key)) {
            result = this.internalCache.get(key);
        }
        else {
            result = valueSupplier.get();
            this.internalCache.put(key, result);
        }
        
        return result;
    }
}

We have placed the cache logic into one class which is backed by a Map implementation. The method get reads a value from the internal map. If there does not exist a map entry for the passed key the method will create a value and will insert it into the map for key before returning it.

It is the responsibility of the class calling the get method to provide the logic how to create a new instance of the value class, not of the Cache class​2​ which purpose is to do the caching only.

This logic is passed to the get method by the parameter valueSupplier. The Cache class does not know how to create a new value instance. It will just call the valueSupplier to obtain one.

Because we separated the concerns​3​ of creating an instance of a value and the pure cache logic we can use the Cache class universally.

The code within the calling class may look like that:

K key = ...;
V value;

Cache<K, V> cache = new Cache<>();

...

value = cache.get(key, () -> this.createValue(...));

Handling concurrency

Unfortunately, the Cache class is not thread-safe. If two threads access the cache concurrently with the same key and there does not exist a cache entry for that key the value may be created and inserted to the cache twice. Because the key is alike the second insertion would overwrite the first one. This is ok in this context. However, if creating a value is very expensive we only want to insert the cache entry once.

We solve this issue by guarding the if statement by a reentrant lock​4​:

private final Map<K, V> internalCache = new HashMap<>();
private final ReentrantLock lock = new ReentrantLock();

public V get(K key, Supplier<V> valueSupplier) {
	Objects.requireNonNull(valueSupplier, "Parameter valueSupplier should not be null.");

	V result;

	this.lock.lock();
	try {
		if (this.internalCache.containsKey(key)) {
			result = this.internalCache.get(key);
		} else {
			result = valueSupplier.get();
			this.internalCache.put(key, result);
		}
	}
	finally {
		this.lock.unlock();
	}

	return result;
}

The code of the class calling the get method is not affected by this enhancement.

Null or not null, that is the question.

Our cache is backed by an HashMap instance which allows us to use null keys and values​5​. But does that make sense?

What would it mean to use a null key in the context of caching? We want to store something, but we do not know by what key? In my opinion, a null key for a cache does not make sense. Therefore, we add a check to the cache´s get method:

public V get(K key, Supplier<V> valueSupplier) {
	Objects.requireNonNull(key, "Parameter key should not be null.");
	Objects.requireNonNull(valueSupplier, "Parameter valueSupplier should not be null.");
	...
}

What about a null value for a given key in a cache? A possible interpretation is that no value could be instantiated for a certain key because there was no such value. Therefore re-trying to create that value instance is useless.

The problem of Map.get(K key) returning null is that it is not clear why it has returned null. It could mean that either there does not exist any entry for the given key or the value assigned to that key is null.

In our code we have checked the existence of an entry for a given key explicitly by calling the method containsKey so far. If our cache is very large we would like to access it only once by using the method Map.get only.

One way to do that is to wrap the value in a CachedValue class and to ensure that each Map entry contains a CachedValue instance:

public class Cache<K, V> {

    private final Map<K, CachedValue<V>> internalCache = new HashMap<>();
    private final ReentrantLock lock = new ReentrantLock();

    public V get(K key, Supplier<V> valueSupplier) {
        Objects.requireNonNull(key, "Parameter key should not be null.");
        Objects.requireNonNull(valueSupplier, "Parameter valueSupplier should not be null.");

        V result;

        this.lock.lock();
        try {
            CachedValue<V> cachedValue = this.internalCache.get(key);

            if (cachedValue != null) {
                result = cachedValue.getValue();
            }
            else {
                result = valueSupplier.get();
                this.internalCache.put(key, new CachedValue<>(result));
            }
        }
        finally {
            this.lock.unlock();
        }

        return result;
    }

    private static class CachedValue<V> {

        private final V value;

        private CachedValue(V value) {
            this.value = value;
        }

        public V getValue() {
            return this.value;
        }
    }
}

this.internalCache.get is returning only null now if no entry for a given key exists. If the value is null a CachedValue instance is returned which is wrapping the null value.

Time to say goodbye

Currently, we never update existing cache entries. However, data held by the cache may expire sometimes. If we filled the cache completely at once we could implement some strategy to reload the entire cache after some time.

Since our cache is growing incrementally this strategy does not make any sense because our cache may contain expired and valid cache entries.

Instead, each time we retrieve data from the cache we will check if it has expired. If this is the case we will treat the expired data as if we had not found any cache entry for the given key.

When we extended our cache to hold null values we introduced the class CachedValue. This class is an excellent place to store the expiry information:

private static class CachedValue<V> {

	private final V value;
	private final ZonedDateTime expiry;

	private CachedValue(V value, ZonedDateTime expiry) {
		this.value = value;
		this.expiry = expiry;
	}

	public V getValue() {
		return this.value;
	}

	private boolean hasExpired() {
		return this.hasExpired(ZonedDateTime.now());
	}

	private boolean hasExpired(ZonedDateTime referenceDateTime) {
		return this.expiry != null && referenceDateTime != null && !this.expiry.isAfter(referenceDateTime);
	}
}

We have just added a property expiry which stores the time at which the cache entry expires and should be updated. We have also coded a method hasExpired() which returns true if the cache entry has expired.

We initialize our cache with some functionality which determines the expiry:

private final Supplier<ZonedDateTime> expirySupplier;

Cache(Supplier<ZonedDateTime> expirySupplier) {
	this.expirySupplier = new CacheValueExpirySupplier(expirySupplier);
}

The functionality is passed to the constructor as a Supplier. Passing null to the constructor means that the entries of this cache never expire.

Note that we wrap the Supplier object by a CacheValueExpirySupplier instance:

class CacheValueExpirySupplier implements Supplier<ZonedDateTime> {

    private static final long NANOSECONDS_OF_MIN_EXPIRY_AFTER_NOW = 1L;

    private final Supplier<ZonedDateTime> expirySupplier;

    CacheValueExpirySupplier(Supplier<ZonedDateTime> expirySupplier) {
        this.expirySupplier = expirySupplier;
    }

    @Override
    public ZonedDateTime get() {
        ZonedDateTime expiry = this.expirySupplier == null ? null : expirySupplier.get();
        ZonedDateTime minExpiry = ZonedDateTime.now().plusNanos(NANOSECONDS_OF_MIN_EXPIRY_AFTER_NOW);

        return ConverterToZonedDateTime.convertToTimeZoneUTC(expiry == null || minExpiry.isBefore(expiry) ? expiry : minExpiry);
    }
}

We cannot control what Supplier object is passed to the constructor. It could be null, its get() method could return null or some point of time in the past which would mean that our cache entry would have expired before being added to the cache.

The wrapper classes´ get() method returns null if a cache entry should never expire. This will happen if the backing Supplier object is null or if its get() method returns null.

If the backing Supplier object´s get() method provides a value not representing some time in the future the CacheValueExpirySupplier instance´s get() method will return a future time point close to ZonedDateTime.now() instead. This way we ensure that just proper expiry values are created.

It is not necessary to convert the expiry to UTC because the methods isBefore and isAfter of the class ZonedDateTime compare two ZonedDateTime instances by their instant. So these properties will be correctly compared even if they reside in different time zones. However, it is a good habit to work with one location independent timezone.

The constructor of the Cache<K, V> class is package-private. In order to create a cache we supply a factory class:

public class CacheFactory {

    public <K, V> Cache<K, V> createEmptyCache(Class<K> keyClass, Class<V> valueClass) {
        return this.createEmptyCache(keyClass, valueClass, null);
    }

    public <K, V> Cache<K, V> createEmptyCache(Class<K> keyClass, Class<V> valueClass, Supplier<ZonedDateTime> expirySupplier) {
        Objects.requireNonNull(keyClass, "Parameter keyClass should not be null.");
        Objects.requireNonNull(valueClass, "Parameter valueClass should not be null.");

        return new Cache<>(expirySupplier);
    }
}

We need to pass the key and value class to the createEmptyCache methods of the factory to let those methods know for what key and value types we want to create the cache for.

The functionality to create the expiry is injected to the cache by the parameter expirySupplier.

Because we cannot access the constructor of the cache directly from a calling class we have to adjust our code to create a cache:

K key = ...;
Class<K> keyClass = key.getClass();
Class<V> valueClass = ...;

CacheFactory factory = new CacheFactory();
Cache<K, V> cache = factory.createEmptyCache(keyClass, valueClass);

...

value = cache.get(key, () -> this.createValue(...));

We are now extending the code to honour the expiry of a cache entry:

this.lock.lock();
try {
	CachedValue<V> cachedValue = this.internalCache.get(key);

	if (cachedValue != null && !cachedValue.hasExpired()) {
		result = cachedValue.getValue();
	}
	else {
		result = valueSupplier.get();
		this.internalCache.put(key, new CachedValue<>(result, this.expirySupplier.get()));
	}
}
finally {
	this.lock.unlock();
}

A new cache entry will be created if either no entry is found in the cache for the given key or if the cache entry has expired (line 5). If we create a new cache entry its expiry will be passed to its constructor (line 10).

The show must go on.

Caches sometimes consume quite a bit of memory taking the JVM´s memory to its limit. Usually, caches and therefore their data cannot be removed by the garbage collector because some class always holds a reference on the cache instance and the cache object itself references all its data. This can cause an OutOfMemoryError.

JAVA offers the concept of soft references​6​. If only soft references are held on an object and the JVM is at the brink of running out of memory the garbage collector will delete that object. If some other object holds a hard reference on that instance the garbage collector will not touch it.

First, we modify the Map backing the cache:

private final Map<K, SoftReference<CachedValue<V>>> internalCache = new HashMap<>();

We get a cache entry from the cache by retrieving the soft reference from the Map and getting the cache value from the soft reference:

private CachedValue<V> getCacheValue(K key) {
	SoftReference<CachedValue<V>> reference = this.internalCache.get(key);

	return reference == null ? null : reference.get();
}

If the cache entry has been deleted by the garbage collector reference.get() will return null. In that case, we will proceed as if no cache entry has been found for the given key.

Adding a cache entry wrapped by a soft reference is straightforward, too:

private void putCacheValue(K key, CachedValue<V> cachedValue) {
	SoftReference<CachedValue<V>> reference = new SoftReference<>(cachedValue);

	this.internalCache.put(key, reference);
}

Finally, we add the methods we have just implemented to the code inside the cache´s method get(K key, Supplier valueSupplier):

this.lock.lock();
try {
	CachedValue<V> cachedValue = this.getCacheValue(key);

	if (cachedValue != null && !cachedValue.hasExpired()) {
		result = cachedValue.getValue();
	}
	else {
		result = valueSupplier.get();
		this.putCacheValue(key, new CachedValue<>(result, this.expirySupplier.get()));
	}
}
finally {
	this.lock.unlock();
}

Pre-populating the cache

Initializing a cache with a map is straightforward. we add two methods createPrePopulatedCache to our CacheFactory class.

The first method creates an empty cache and pre-populates it with the entries of the passed map. The cache created by this method does not handle expiration.

The second method has a parameter initializationData for the data to pre-populate the cache with and a second parameter expirySupplier which will determine the expiry.

If the map is null or empty these methods behave like the CacheFactory methods createEmptyCache. Any map entries with null keys are omitted to avoid exceptions.

public class CacheFactory {

    public <K, V> Cache<K, V> createPrePopulatedCache(Map<K, V> initializationData) {
        return this.createPrePopulatedCache(initializationData, null);
    }

    public <K, V> Cache<K, V> createPrePopulatedCache(Map<K, V> initializationData, Supplier<ZonedDateTime> expirySupplier) {
        Cache<K, V> cache = this.createEmptyCache(initializationData, expirySupplier);

        if (!initializationData.isEmpty()) {
            initializationData.entrySet().stream()
                    .filter(Objects::nonNull)
                    .filter(kvEntry -> kvEntry.getKey() != null)
                    .forEach(kvEntry -> cache.get(kvEntry.getKey(), kvEntry::getValue));
        }

        return cache;
    }

    private <K, V> Cache<K, V> createEmptyCache(Map<K, V> initializationData, Supplier<ZonedDateTime> expirySupplier) {
        Objects.requireNonNull(initializationData, "Parameter initializationData should not be null.");

        return new Cache<>(expirySupplier);
    }

    public <K, V> Cache<K, V> createEmptyCache(Class<K> keyClass, Class<V> valueClass) {
        return this.createEmptyCache(keyClass, valueClass, null);
    }

    public <K, V> Cache<K, V> createEmptyCache(Class<K> keyClass, Class<V> valueClass, Supplier<ZonedDateTime> expirySupplier) {
        Objects.requireNonNull(keyClass, "Parameter keyClass should not be null.");
        Objects.requireNonNull(valueClass, "Parameter valueClass should not be null.");

        return new Cache<>(expirySupplier);
    }
}

If we want to pre-populate the cache the client code will look like this:

K key = ...;
Map<K, V> prePopulationData = ...;

CacheFactory factory = new CacheFactory();
Cache<K, V> cache = factory.createPrePopulatedCache(prePopulationData);

...

value = cache.get(key, () -> this.createValue(...));

Non blocking cache access

Sometimes, retrieving a value from a source takes really long or we prefer non blocking access to the cache for some reason. For those occasions we provide a method for the Cache class allowing client code to read cache values asynchronously:

public void getAsync(K key, Supplier<V> valueSupplier, Consumer<V> callback) {
	Objects.requireNonNull(key, "Parameter key should not be null.");
	Objects.requireNonNull(valueSupplier, "Parameter valueSupplier should not be null.");
	Objects.requireNonNull(callback, "Parameter callback should not be null.");

	CompletableFuture.supplyAsync(() -> this.get(key, valueSupplier))
			.thenAccept(callback);
}

The method getAsync returns void. The result of retrieving the value from the cache is processed by the callback passed to this method.

Example for the code of a calling class:

K key = ...;
Class<K> keyClass = key.getClass();
Class<V> valueClass = ...;

CacheFactory factory = new CacheFactory();
Cache<K, V> cache = factory.createEmptyCache(keyClass, valueClass);

...

value = cache.getAsync(key, 
	() -> this.createValue(...),
	(result) -> this.handleCacheResult(result)
);

Conclusion

In this blog post we have shown the benefits of coding a cache class instead of using JAVA´s built-in Map implementations. By placing the logic to retrieve entries from and adding them to the cache in that class we have avoided to scatter boilerplate code all over our code base.

We then extended the Cache class by further features:

  • thread-safety
  • null handling
  • expiring cache entries
  • avoiding OutOfMemoryError by using soft references
  • pre-populating caches
  • non blocking cache access

Though the Cache class has evolved dramatically using it still remains straightforward:

We choose one of the four methods provided by CacheFactory to create a Cache instance.

Then we either use its get or getAsync method to retrieve values from the cache depending on whether we wish to access the cache directly or the non blocking way.

Bibliography

  1. 1.
    Westphal R, Lieser S. Grade 1 Red – Don’t Repeat Yourself (DRY). Clean Code Developer. https://clean-code-developer.com/grades/grade-1-red/#Don8217t_Repeat_Yourself_DRY. Published 2015. Accessed March 1, 2020.
  2. 2.
    Westphal R, Lieser S. Grade 2 Orange – Single Responsibility Principle (SRP). Clean Code Developer. https://clean-code-developer.com/grades/grade-2-orange/#Single_Responsibility_Principle_SRP. Published 2015. Accessed March 1, 2020.
  3. 3.
    Westphal R, Lieser S. Grade 2 Orange – Separation of Concerns (SoC). Clean Code Developer. https://clean-code-developer.com/grades/grade-2-orange/#Separation_of_Concerns_SoC. Published 2015. Accessed March 1, 2020.
  4. 4.
    Class ReentrantLock. JavaTM Platform Standard Ed. 8. https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html. Accessed March 1, 2020.
  5. 5.
    Class HashMap<K,V>. JavaTM Platform Standard Ed. 8. https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html. Accessed March 1, 2020.
  6. 6.
    Class SoftReference<T>. JavaTM Platform Standard Ed. 8. https://docs.oracle.com/javase/8/docs/api/java/lang/ref/SoftReference.html. Accessed March 1, 2020.

Leave a Reply

Your email address will not be published. Required fields are marked *