The basic installation of the directory, as with other UnboundID products is designed to be simple/easy. The resulting installation is all below a directory created when the distribution zip file is unpacked. The container directory may be renamed before running the setup script. If it is changed later, it will still work, but there will be a severe warning issued next time the server is started.
Provided that the filestore partition onto which the server is unpacked has sufficient capacity and performance, this organization fine.
The server distribution itself doesn’t require much space, about 150MB for the unpacked zip. However, the database and log files will require considerably more. The database and (most) log files can be relocated if desired, but there are still some areas of the basic distribution which will grow over time. These are mainly concerned with server configuration.
As configuration changes are made, a configuration audit log (~/logs/config-audit.log) is written for each change. In addition, a backup copy of the server configuration is kept for each change (~/config/archived-configs).
The assumption is that on any production server, beyond the initial configuration, changes to the configuration should be rare and infrequent.
It is possible to delete configuration backup files as long as you are certain that the old configs are no longer needed. The configuration audit log should not be touched. This is a complete record of all changes made since the server installation and is useful not only for your own security auditing purposes, but also to aid UnboundID support in tracking down problems, where a concise history of the configuration can be very useful, especially in establishing root cause.
For these reasons, it is probably best to plan on around 350MB minimum for the base install, excluding database and log files.
One of the strength of the UnboundID servers in general is very complete and customisable logging. This really means that the volume of the log data is directly under your control. However, that said, the default logging configuration should really be considered the minimum to allow tracking of server activity, detection of and tracing of any errors, and providing information potentially needed by UnboundID support in tracking down error or performance related issues.
Logs can be written directly in compressed (gzip) form to save considerably on disk space if required. However, this does make debugging somewhat more difficult since you can’t “tail” a compressed log file. You have to find space to copy it, decompress it and then read it.
Logs can be relocated to anywhere in the filesystem. There are two ways of doing this. The old trick of using symlinks to relocate the logs directory works, but, of course, still suffers from the traditional problems of what happens when symlinks are removed, never created etc. There is also an extra penalty in resolving filenames since this has to happen twice, once to find the symlink, then again to find the actual file. This is typically insignificant on modern systems. For these reasons UnboundID typically advises against using symlinks.
The second solution is to change the log location (independently for each log) in the server configuration. This has an added advantage that the server tools are thus aware of the physical location, so should you use the uninstall command (for example), it will delete the files from their actual location rather than potentially just deleting the symlink and leaving the logs intact. This is the preferred solution.
The volume of the log data is directly related to:
Rate of requests
Logfile retention policies
Log archiving policies
It is thus very difficult to predict how much space to dedicate to logs. As a minimum, for a medium-busy server, you should probably plan on not much less than 100GB. If log retention is not important, you can adjust the log retention policies to roll and delete logs more frequently to reduce usage.
One logfile to pay particular attention to is the errors log. Beyond start and stop information, this log should basically never change. However, it is not unusual to find that certain applications may make requests that generate basically harmless errors, which nevertheless get logged, and eat up disk space, as well as making sifting through them to locate “real” errors that much more difficult. Any such “harmless” errors really should be fixed.
The size of the database is, of course, directly related the volume of data it has to hold. There are various techniques used minimize the on-disk and in-memory data size, so its is not normally possible to (for example) look at the size of the LDIF being used to load the data, or the average size of retrieved entries. However, it is reasonable to assume that a freshly loaded database will not be larger than this.
The database, independently of newly added data, will grow in size as entries are modified. There is an upper limit to this of about 25%, beyond which the database is self-compacting. For a given set of data, plan for at least 25% growth if the data is subject to modification (don’t for that that last login time counts as a modification, for example).
Indexes will consume space. Adding new indexes will require additional space.
Replication will add to the database size. There are two aspects to this, one is the replication changelog which records changes and is used by the replication sub-system to ensure that each member of the replication topology receives all updates. This data is trimmed after a configured time. The out of box purge-delay is 24 hours, after which old changelog data is deleted. However, many users prefer to keep this data around longer (typically 3 days).
In addition, as changes are made to individual entries, the change data is recorded in that entry. This may be used for automatic conflict resolution which can arise given the distributed nature of replication and the possibility of conflicting changes occurring almost simultaneously on different server instances.
The entry-level data is subject to the system-wide replication purge-delay setting, but will not be removed until the entry is next accessed for a write operation, and so may persist well beyond its expiration time.
It is difficult to predict how much impact replication data will have on the database. It is typically something like 20%, but can be much less and sometimes more, depending upon rate of change, size of changes, purge-delay setting etc.
The most accurate guide to determining required partition size for the database is to load representative data into a similar topology and run a test load simulating real traffic for a few days.
Alternatively, look at the freshly loaded database and add 50%.
Choosing the right filesystem technology can make difference between needing to relocate log/database partitions or not.
The filesystem performance mostly impacts database writes when the server is configured to cache the entire database in memory. Sometimes, it is infeasible to totally cache the database (e.g. size of database) and entry-balancing (sharding) of the data is not an option. In these cases, the filesystem performance will also impact read operations.
A non-fully cached database is not necessarily a huge problem. In many cases there is a distinct “working set” of data, and as long as this can be cached most requests will not be impacted by having to go to disk to retrieve the data.
The preferred technologies are:
SSD Solid State Disk devices are becoming more attractive all the time. For outright performance with intensive write operations or partially cached databases they cannot be beaten. With SSD drives and a modern virtual filesystem above them, there is no reason to split out log/db components.
SAN technology works well. Modern SAN detects disk “hot-spots” and moves data around internally to accommodate. This means that the out-of box unzip of the directory can work very well on a SAN, no need to have separate partitions since the busy log/db data will automatically be optimized on the SAN itself. With older fileserver technologies, which amount to little more than JBOD, you may have to optimize yourself assigning separate partitions for db, log and possibly the server components, with different spindle sets for each partition. This is generally only needed for large, high performance systems.
Local Disk. If local disk is used, we generally recommend at least configuring this in RAID 0+1 configuration. This provides redundancy and some performance advantages.
NFS will work, but practical experience shows much more frequent filesystem related issues. Very reliable fileservers and very reliable network are essential of this to work successfully.
For non SAN/SSD installations where best write performance is required it is suggested to dedicate separate spindle sets to:
These may be partitions directly mounted on the directories corresponding in the unpacked zip (requiring no configuration changes), as separate mounts (e.g. /data/logs, /data/database, /data/changelogs etc.), which will require configuration changes (or symlinks, if you must).
Whatever solution is adopted be certain to copy the existing data to the new partitions, with the same ownership and permissions before restarting the server.
Java Virtual Machine
Although the server will run with a JRE we recommend using a full JDK. This is mainly for the debugging tools available in the JDK that are not present in the JRE. The availability of these tools can make the difference between easy and fast resolution of problems or not.
It is not generally recommended to use the system supplied java. This is because typically, the directory server and OS administration tasks are split. It has been know for OS admins to do a full system update (yum update) and the resulting changes to the java installation to break the directory server.
In general, it is recommended to use a private copy of java to run the directory, and to fully regression test any new java version in a dev/qa environment before moving it to production.
In general, this testing will already have taken place by UnboundID engineering, but there have been cases of new versions being made publicly available without any notice to enable testing before that release, and user created server plugins may contain code has compatibility issues. Blind deployment of a new java version is not recommended. It will work in 99 cases out of 100, but that one case may cause a lot of problems.
The minimum version of Java required for a given directory release is noted in the release notes for that version. E.g. for 6.0 the minimum is Java 1.8
One thing to be aware of when there is a system and private java installation is that you should always ensure that you use them consistently. Ensure that JAVA_HOME and the PATH pick up the private version while working on the directory.
As an example, the jps command is much more user friendly than the system ps command.
But unfortunately, the data they refer to is specific to the installation being accessed. If you run the system jps while the directory is using its private copy of java, no directory java processes will be seen. This can cause a lot of confusion. To say the least.