session data: HTTP Session and Stateful session beans. While the two mechanisms are conceptually different, the way they are implemented, they are quite similar as far as it concerns session synchronization across the cluster. For this reason, we will discuss them in a single section, showing the similarities between them. So, here’s the cache-container configuration for the web cache and the sfsb cache:
As you can see, the web cache container configuration can contain one or more caching strategies: the replicated-cache and the distributed-cache, the former being the default one. In the next section, we will explore in detail the differences and the gotchas between these two cache modes. At the moment, bear in mind that if you want to change the clustering mode, all you have to do is adapting the default-cache attribute to your cache mode.
As far as it is concerned the data synchronization across members can be carried out using either synchronous messages (SYNC) or asynchronous messages (ASYNC).
Synchronous messaging is the least efficient mode as each node will wait for message acknowledgement from all cluster members. However, synchronous mode is needed when all the nodes in the cluster may access the cached data, resulting in a high need for consistency.
Asynchronous messaging enhances speed rather than consistency and this is particularly advantageous in use cases such as HTTP session replication with sticky sessions enabled. In these scenarios, a session is always accessed on the same cluster node, and only in case of failure is data accessed in a different node.
Inside each cache definition, you should have noticed the locking-isolation element that corresponds in semantic to the equivalent database isolation levels. Infinispan only supports READ_COMMITTED or REPEATABLE_READ isolation level s.
Another element that is included in both caches is file-store, which configures the path in which to store the cached data. The default data is written into the jboss.server.data.dir directory under a directory named as the cache container name. For example, here’s the default file-store path for the standalone web cache container:
You can, however, customize the file-store path using the relative-to and path element, just as we showed in past tutorial for the path element:
By having a clear idea of the elements, which include a part of the configuration, we want to hammer on the differences between replicated and distributed caches, without which, you will miss the whole picture.
Choosing between replication and distribution When using a replication, cache Infinispan will store every entry on every node in the cluster grid. This means that entries added to any of these cache instances will be replicated to all other cache instances in the cluster, and can be retrieved from any instance locally.
The scalability of replication is a function of cluster size and average data size, so if we have many nodes and/or large data sets, then we hit a scalability ceiling.
On the other hand, when using distributed caching, Infinispan will store every cluster entry on a subset of the nodes in the grid thereby allowing to scale linearly as more servers are added to the cluster.
Distribution makes use of a consistent hash algorithm to determine where in a cluster entries should be stored. Hashing algorithm is configured with the number of copies each cache entry should be maintained cluster-wide. The number of copies represents the trade-off between performance and durability of data. The more copies you
maintain, the lower performance will be, but also the lower the risk of losing data due to server outages.
You can use the owners parameter (default 2) to define the number of cluster-wide replicas for each cache entry.
. . . . .
By definition, the choice between replication and distribution depends largely on the cluster size. For example, replication provides a quick and easy way to share state across a cluster; however, it only performs well in small clusters (fewer than ten servers), due to the number of replication messages that need to happen—as the cluster
In a distributed cache, a number of copies are maintained to provide redundancy and fault tolerance; however, this is typically far fewer than the number of nodes in the cluster. Hence, a distributed cache provides a far greater degree of scalability than a replicated cache.
Buddy replication was the most effective solution for scaling web applications in earlier releases of JBoss AS. The major advantage of buddy replication was that, instead of replicating data across every node of the cluster, it chose a fixed number of backup nodes and only replicated data to these backups.
A quite helpful thing but with one caveat: buddy replication was specifically designed for HTTP-session replication. This in turn requires, for achieving a real performance benefit, the use of session affinity, also known as sticky sessions in HTTP session replication speak.
As we have been using the past tense, you might guess that a buddy replication has been abandoned in JBoss AS 7. The Infinispan’s distribution mode is the functional replacement of buddy replication.
While providing a near-linear scalable solution, by using a fixed number of owners to hold the cache data, it does not need session affinity. Hence, it is applicable to a wider set of use cases than HTTP session.
Get Updates on Tech posts, Interview & Certification questions and training schedules