At the base of KeyDB replication there is a very simple to use and configure leader follower (master-slave) replication: it allows slave KeyDB instances to be exact copies of master instances. The slave will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
- When a master and a slave instances are well-connected, the master keeps the slave updated by sending a stream of commands to the slave, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
- When the link between the master and the slave breaks, for network issues or because a timeout is sensed in the master or the slave, the slave reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
- When a partial resynchronization is not possible, the slave will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the slave, and then continue sending the stream of commands as the dataset changes.
KeyDB uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of KeyDB use cases. However KeyDB slaves asynchronously acknowledge the amount of data they received periodically with the master. So the master does not wait every time for a command to be processed by the slaves, however it knows, if needed, what slave already processed what command. This allows to have optional syncrhonous replication.
Synchronous replication of certain data can be requested by the clients using
WAIT command. However
WAIT is only able to ensure that there are the
specified number of acknowledged copies in the other KeyDB instances, it does not
turn a set of KeyDB instances into a CP system with strong consistency: acknowledged
writes can still be lost during a failover, depending on the exact configuration
of the KeyDB persistence. However with
WAIT the probability of losign a write
after a failure event is greatly reduced to certain hard to trigger failure
You could check the Sentinel or KeyDB Cluster documentation for more information about high availability and failover. However starting with Active-Replication is encouraged as it is simple to use and a powerful tool for high availability. The rest of this document mainly describe the basic characteristics of KeyDB basic replication.
The following are some very important facts about KeyDB replication:
- KeyDB uses asynchronous replication, with asynchronous slave-to-master acknowledges of the amount of data processed.
- A master can have multiple slaves.
- Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a cascading-like structure. Since KeyDB 4.0, all the sub-slaves will receive exactly the same replication stream from the master.
- KeyDB replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization or a partial resynchronization.
- Replication is also largely non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured KeyDB to do so in keydb.conf. Otherwise, you can configure KeyDB slaves to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The slave will block incoming connections during this brief window (that can be as long as many seconds for very large datasets). Since KeyDB 4.0 it is possible to configure KeyDB so that the deletion of the old data set happens in a different thread, however loading the new initial dataset will still happen in the main thread and block the slave.
- Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, slow O(N) operations can be offloaded to slaves), or simply for improving data safety and high availability.
- It is possible to use replication to avoid the cost of having the master writing the full dataset to disk: a typical technique involves configuring your master
keydb.confto avoid persisting to disk at all, then connect a slave configured to save from time to time, or with AOF enabled. However this setup must be handled with care, since a restarting master will start with an empty dataset: if the slave tries to synchronized with it, the slave will be emptied as well.
In setups where KeyDB replication is used, it is strongly advised to have persistence turned on in the master and in the slaves. When this is not possible, for example because of latency concerns due to very slow disks, instances should be configured to avoid restarting automatically after a reboot.
To better understand why masters with persistence turned off configured to auto restart are dangerous, check the following failure mode where data is wiped from the master and all its slaves:
- We have a setup with node A acting as master, with persistence turned down, and nodes B and C replicating from node A.
- Node A crashes, however it has some auto-restart system, that restarts the process. However since persistence is turned off, the node restarts with an empty data set.
- Nodes B and C will replicate from node A, which is empty, so they'll effectively destroy their copy of the data.
When KeyDB Sentinel is used for high availability, also turning off persistence on the master, together with auto restart of the process, is dangerous. For example the master can restart fast enough for Sentinel to don't detect a failure, so that the failure mode described above happens.
Every time data safety is important, and replication is used with master configured without persistence, auto restart of instances should be disabled.
Every KeyDB master has a replication ID: it is a large pseudo random string that marks a given story of the dataset. Each master also takes an offset that increments for every byte of replication stream that it is produced to be sent to slaves, in order to update the state of the slaves with the new changes modifying the dataset. The replication offset is incremented even if no slave is actually connected, so basically every given pair of:
Identifies an exact version of the dataset of a master.
When slaves connects to masters, they use the
PSYNC command in order to send
their old master replication ID and the offsets they processed so far. This way
the master can send just the incremental part needed. However if there is not
enough backlog in the master buffers, or if the slave is referring to an
history (replication ID) which is no longer known, than a full resynchronization
happens: in this case the slave will get a full copy of the dataset, from scratch.
This is how a full synchronization works in more details:
The master starts a background saving process in order to produce an RDB file. At the same time it starts to buffer all new write commands received from the clients. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send all buffered commands to the slave. This is done as a stream of commands and is in the same format of the KeyDB protocol itself.
You can try it yourself via telnet. Connect to the KeyDB port while the
server is doing some work and issue the
SYNC command. You'll see a bulk
transfer and then every command received by the master will be re-issued
in the telnet session. Actually
SYNC is an old protocol no longer used by
newer KeyDB instances, but is still there for backward compatibility: it does
not allow partial resynchronizations, so now
PSYNC is used instead.
As already said, slaves are able to automatically reconnect when the master-slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them.
In the previous section we said that if two instances have the same replication ID and replication offset, they have exactly the same data. However it is useful to understand what exactly is the replication ID, and why instances have actually two replication IDs the main ID and the secondary ID.
A replication ID basically marks a given history of the data set. Every time an instance restarts from scratch as a master, or a slave is promoted to master, a new replication ID is generated for this instance. The slaves connected to a master will inherit its replication ID after the handshake. So two instances with the same ID are related by the fact that they hold the same data, but potentially at a different time. It is the offset that works as a logical time to understand, for a given history (replication ID) who holds the most updated data set.
For instance if two instances A and B have the same replication ID, but one with offset 1000 and one with offset 1023, it means that the first lacks certain commands applied to the data set. It also means that A, by applying just a few commands, may reach exactly the same state of B.
The reason why KeyDB instances have two replication IDs is because of slaves that are promoted to masters. After a failover, the promoted slave requires to still remember what was its past replication ID, because such replication ID was the one of the former master. In this way, when other slaves will synchronize with the new master, they will try to perform a partial resynchronization using the old master replication ID. This will work as expected, because when the slave is promoted to master it sets its secondary ID to its main ID, remembering what was the offset when this ID switch happened. Later it will select a new random replication ID, because a new history begins. When handling the new slaves connecting, the master will match their IDs and offsets both with the current ID and the secondary ID (up to a given offset, for safety). In short this means that after a failover, slaves connecting to the new promoted master don't have to perform a full sync.
In case you wonder why a slave promoted to master needs to change its replication ID after a failover: it is possible that the old master is still working as a master because of some network partition: retaining the same replication ID would violate the fact that the same ID and same offset of any two random instances mean they have the same data set.
Normally a full resynchronization requires to create an RDB file on disk, then reload the same RDB from disk in order to feed the slaves with the data.
With slow disks this can be a very stressing operation for the master. KeyDB version 2.8.18 is the first version to have support for diskless replication. In this setup the child process directly sends the RDB over the wire to slaves, without using the disk as intermediate storage.
To configure basic KeyDB replication is trivial: just add the following line to the slave configuration file:
Of course you need to replace 192.168.1.1 6379 with your master IP address (or
hostname) and port. Alternatively, you can call the
SLAVEOF command and the
master host will start a sync with the slave.
There are also a few parameters for tuning the replication backlog taken
in memory by the master to perform the partial resynchronization. See the example
keydb.conf shipped with the KeyDB distribution for more information.
Diskless replication can be enabled using the
parameter. The delay to start the transfer in order to wait more slaves to
arrive after the first one, is controlled by the
parameter. Please refer to the example
keydb.conf file in the KeyDB distribution
for more details.
Since KeyDB 2.6, slaves support a read-only mode that is enabled by default.
This behavior is controlled by the
slave-read-only option in the keydb.conf file, and can be enabled and disabled at runtime using
Read-only slaves will reject all write commands, so that it is not possible to write to a slave because of a mistake. This does not mean that the feature is intended to expose a slave instance to the internet or more generally to a network where untrusted clients exist, because administrative commands like
CONFIG are still enabled. However, security of read-only instances can be improved by disabling commands in keydb.conf using the
You may wonder why it is possible to revert the read-only setting and have slave instances that can be targeted by write operations. While those writes will be discarded if the slave and the master resynchronize or if the slave is restarted, there are a few legitimate use case for storing ephemeral data in writable slaves.
For example computing slow Set or Sorted set operations and storing them into local keys is an use case for writable slaves that was observed multiple times.
However note that writable slaves before version 4.0 were incapable of expiring keys with a time to live set. This means that if you use
EXPIRE or other commands that set a maximum TTL for a key, the key will leak, and while you may no longer see it while accessing it with read commands, you will see it in the count of keys and it will still use memory. So in general mixing writable slaves (previous version 4.0) and keys with TTL is going to create issues.
KeyDB 4.0 RC3 and greater versions totally solve this problem and now writable slaves are able to evict keys with TTL as masters do, with the exceptions of keys written in DB numbers greater than 63 (but by default KeyDB instances only have 16 databases).
Also note that since KeyDB 4.0 slave writes are only local, and are not propagated to sub-slaves attached to the instance. Sub slaves instead will always receive the replication stream identical to the one sent by the top-level master to the intermediate slaves. So for example in the following setup:
B is writable, C will not see
B writes and will instead have identical dataset as the master instance
If your master has a password via
requirepass, it's trivial to configure the
slave to use that password in all sync operations.
To do it on a running instance, use
keydb-cli and type:
To set it permanently, add this to your config file:
Starting with KeyDB 2.8, it is possible to configure a KeyDB master to accept write queries only if at least N slaves are currently connected to the master.
However, because KeyDB uses asynchronous replication it is not possible to ensure the slave actually received a given write, so there is always a window for data loss.
This is how the feature works:
- KeyDB slaves ping the master every second, acknowledging the amount of replication stream processed.
- KeyDB masters will remember the last time it received a ping from every slave.
- The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds.
If there are at least N slaves, with a lag less than M seconds, then the write will be accepted.
You may think of it as a best effort data safety mechanism, where consistency is not ensured for a given write, but at least the time window for data loss is restricted to a given number of seconds. In general bound data loss is better than unbound one.
If the conditions are not met, the master will instead reply with an error and the write will not be accepted.
There are two configuration parameters for this feature:
<number of slaves>
<number of seconds>
For more information, please check the example
keydb.conf file shipped with the
KeyDB source distribution.
KeyDB expires allow keys to have a limited time to live. Such a feature depends on the ability of an instance to count the time, however KeyDB slaves correctly replicate keys with expires, even when such keys are altered using Lua scripts.
To implement such a feature KeyDB cannot rely on the ability of the master and slave to have synchronized clocks, since this is a problem that cannot be solved and would result into race conditions and diverging data sets, so KeyDB uses three main techniques in order to make the replication of expired keys able to work:
- Slaves don't expire keys, instead they wait for masters to expire the keys. When a master expires a key (or evict it because of LRU), it synthesizes a
DELcommand which is transmitted to all the slaves.
- However because of master-driven expire, sometimes slaves may still have in memory keys that are already logically expired, since the master was not able to provide the
DELcommand in time. In order to deal with that the slave uses its logical clock in order to report that a key does not exist only for read operations that don't violate the consistency of the data set (as new commands from the master will arrive). In this way slaves avoid to report logically expired keys are still existing. In practical terms, an HTML fragments cache that uses slaves to scale will avoid returning items that are already older than the desired time to live.
- During Lua scripts executions no keys expires are performed. As a Lua script runs, conceptually the time in the master is frozen, so that a given key will either exist or not for all the time the script runs. This prevents keys to expire in the middle of a script, and is needed in order to send the same script to the slave in a way that is guaranteed to have the same effects in the data set.
Once a slave is promoted to a master it will start to expire keys independently, and will not require any help from its old master.
When Docker, or other types of containers using port forwarding, or Network Address Translation is used, KeyDB replication needs some extra care, especially when using KeyDB Sentinel or other systems where the master
ROLE commands output are scanned in order to discover slaves addresses.
The problem is that the
ROLE command, and the replication section of
INFO output, when issued into a master instance, will show slaves
as having the IP address they use to connect to the master, which, in
environments using NAT may be different compared to the logical address of the
slave instance (the one that clients should use to connect to slaves).
Similarly the slaves will be listed with the listening port configured
keydb.conf, that may be different than the forwarded port in case
the port is remapped.
In order to fix both issues, it is possible, since KeyDB 3.2.2, to force a slave to announce an arbitrary pair of IP and port to the master. The two configurations directives to use are:
And are documented in the example
keydb.conf of recent KeyDB distributions.
There are two KeyDB commands that provide a lot of information on the current
replication parameters of master and slave instances. One is
INFO. If the
command is called with the
replication argument as
INFO replication only
information relevant to the replication are displayed. Another more
computer-friendly command is
ROLE, that provides the replication status of
masters and slaves together with their replication offsets, list of connected
slaves and so forth.
Since KeyDB 4.0, when an instance is promoted to master after a failover, it will be still able to perform a partial resynchronization with the slaves of the old master. To do so, the slave remembers the old replication ID and offset of its former master, so can provide part of the backlog to the connecting slaves even if they ask for the old replication ID.
However the new replication ID of the promoted slave will be different, since it constitutes a different history of the data set. For example, the master can return available and can continue accepting writes for some time, so using the same replication ID in the promoted slave would violate the rule that a of replication ID and offset pair identifies only a single data set.
Moreover slaves when powered off gently and restarted, are able to store in the
RDB file the information needed in order to resynchronize with their master.
This is useful in case of upgrades. When this is needed, it is better to use
SHUTDOWN command in order to perform a
save & quit operation on the slave.
It is not possilbe to partially resynchronize a slave that restarted via the AOF file. However the instance may be turned to RDB persistence before shutting down it, than can be restarted, and finally AOF can be enabled again.