This KB article will focus on replication conflicts which can occur under different situations during replication. Since replication is defined as a eventually consistent state in the UnboundID products, operations that happen on different servers in the topology may conflict with each other and thus cause a conflict. In reading this article we are assuming that you have a good working knowledge of Replication in UnboundID Data Store. If not then it would be helpful for you to read the chapter on Managing Replication chapter in the UnboundID Data Store Administration Guide. The documentation also covers replication conflicts so in this KB we will focus more on how to create them and then fix them so you can get familair with understanding and working with these issues.
Here are some high level points about Replication Conflicts:
Each Directory Server is responsible for handling conflict resolution at the point at which it receives changes from a Replication Server.
If a conflict arises in which an attribute is modified at the same time on two different replicas, the replication server uses the most recent change based on its timestamp. The older modification will be ignored and the latest one is the change that shoud be reflected across all servers in the topology.
If a conflict arises in which two entries with the same DN are added at the same time on two different replicas, the replication conflict resolution will keep the oldest entry DN (earliest createTimeStamp) and rename the most recent entry DN by adding the entryUUID attribute to the RDN of the entry. In addition an objectClass value is added to the entry to flag this entry as a replication conflict. Replication conflicts are not visible to standard LDAP client operations.
Any unresolved conflict generates an administative alert and is logged in the errors log or sent out via other alerting mechanisms such as SNMP or SMTP.
The topology we are going to use with this KB article will be 3 instances of DS that are installed on the same physical server. These will be small instances so not a lot of memory is required.
In this KB we will simulate the effect of adding an entry to two servers in the topology at the same time (or as close as possible) and how replication handles these scenarios.
In order to accomplish this we will actually have to stop the directory servers one at a time to simulate a network outage between the two servers so that replication cannot occur when the adds happen.
Shutdown DS2 & DS3 instances:
Add the following entry to DS1
bin/ldapmodify -p 1189 -D "cn=directory manager" -w secret123 -a <<+
cn: Romina Valerio
Added same entry to DS2 (copy all the lines together and past and then enter.
tail -30 logs/errors
[20/May/2016:21:25:00.035 -0500] threadID=199 category=EXTENSIONS severity=SEVERE_ERROR msgID=1880359005 msg="Administrative alert type=replication-unresolved-conflict id=5761ff92-3298-4cc5-be9c-81e2c9ed00fe class=com.unboundid.directory.server.replication.plugin.ReplicationDomain msg='An unresolved conflict was detected for DN uid=user.2002,ou=People,dc=example,dc=com. The conflicting entry has been renamed to entryuuid=6201e4a6-b312-440f-89f2-032b4db6c72b+uid=user.2002,ou=People,dc=example,dc=com'"
Now you can search for that entry in the directory server using the --control return-conflict-entries option.
You can see that the entry renamed has the entryUUID appended to the RDN. Also the existence of the objectClass: ds-sync-conflict-entry objectclass means that the server will hide this entry from normal operations. If you do the same search without the --control options then it would only return the first entry above. You can also search for all entries with "(objectClass=ds-sync-conflict-entry)" and that will bring back just the replication conflict entries. Using the above search though allows you to see the original entry (earliest createTimeStamp).
You can also see that the passwords are hash with different values. This is the reason that the server could not auto resolve this replication conflict. You will also see that this replication conflict exists on all of your servers and it will be the same entry on each server. This is because of the way that we added the entries and is probably the normal case. There can be scenario's where you will have a replication conflict entry on only one of the servers and not on others.
In addition you can search on the data in the directory server to find conflict entries without having to rely on the information from the error logs. You can ask for just the "ds-sync-conflict" attribute which will tell you which entry in the normal data that this entry is in conflict with.
With this information you can now compare each of these entries with each other to determine which entry needs to be retained and which should be removed or modified.
Replication Conflict Repair
Now that we have determined there is a replication conflict entry, we will need to do something to resolve this. Since we know in this case that both entries are the same we can simply remove the replication conflict entry.
Lets get the DN of the conflict entry that we want to delete.
You can see that we added the -T option to search which tells it to not wrap the results and also we added dn to the end of the search as we only need to get the DN back of the entry.
It is best to check each instance in your topology and make sure that this conflict entry exists on each server. In some cases this conflict entry may only exist on one server. In that case we would only need to delete this entry from the one instance.
Since we are going to just delete this entry we do not have to use any special replication repair controls since this conflict entry may reside on other instances.
Do not specify the dn: component in the string of the entry to be deleted.
** If we wanted to delete this entry only from this one instance and not have the delete operation replicated since the entry does not exist on the other servers, you can use the "--control replication-repair" option with the above command. **
Now the entry is deleted from the directory and the delete operation also replicated to other servers in the case that it existed there.
Now let's do a final check to make sure there are no conflicts in the directory.
Now we can search on some of the monitoring information to see what the server stats show with respect to replication conflicts. We will search on the cn=monitor branch and look for the entries that represent the Replica view of server. In this case these are the entries that have an objectclass=ds-replica-monitor-entry on them. We will also want to add the base-dn of the backend we want to see this information for, otherwise we would see all backends that are replicated (which we might want to).
Appendix: Creating Data Store configuration for this use case.
For the purposes of this article the following will create a deployment of 3 UnboundID Directory Servers to illustrate how to handle and work with replication conflicts as discussed above. These instances are all installed on the same phyiscal server so they will require different ports for both the LDAP and Replication ports. We will use the following ports for this example:
Note: You can only do the copy of the binary folder as above prior to running the setup command on the server. At this point the software is not configured and thus this is the same as unzipping the software again for each directory instance.
Now we will run the setup command to create the initial configuration for each data store. On the first data store we will load 2000 example entries into the backend database.
Note that we used the global admin account, because this is guaranteed to have access to all servers in the topology.
Take a look at the replication log (logs/replication)
You will see notification that generation IDs are not equal, and so replication is currently suspended.
You will also notice that admin and schema data has been exported - this is used to sync both directories with identical data in these backends. You don't have to set up replication for schema and admin data, its automatic.
The next step is to initialize ds2 with the user data from ds1.