cancel
Showing results for 
Search instead for 
Did you mean: 

Dealing with Data Migration Problems

UnboundID NeilW
0 Kudos

We've tried to make the process of migrating data from another directory server into an UnboundID server as easy and convenient as possible, but it's still very common to encounter problems during the migration process. Those problems generally fall into one of the following categories:

 

  • The UnboundID server has not yet been updated with the custom schema that your data uses.

 

  • The data doesn't strictly conform to the defined schema.

 

  • The data includes operational attributes that are specific to the directory server you were previously running but that aren't known to or used by the UnboundID server.

 

The Managing Server Schema document provides a detailed overview of how to configure schema in UnboundID servers, including how to use the migrate-ldap-schema tool to automate the process of getting your custom schema definitions into the UnboundID server. This document focuses on the other two scenarios.

 

Using the validate-ldif Tool

The best way to identify any problems that are likely to arise when trying to import data contained in an LDIF file is to use the validate-ldif file provided with the server. This tool will read a server's schema (either over LDAP or from the LDIF files containing the schema definitions), and will then iterate through the entries in an LDIF file containing the data to be imported, examining them and identifying any potential problems. These problems include:

 

  • Malformed LDIF entries
  • Entries with malformed DNs
  • Entries without any object classes at all
  • Entries with an object class that is not defined in the schema
  • Entries that do not have a structural object class
  • Entries that have multiple structural object classes
  • Entries that include subordinate object classes without listing their superior classes
  • Entries that include an abstract object class without including one or more of its non-abstract subclasses
  • Entries with an auxiliary object class that is not allowed in conjunction with the structural class
  • Entries with an attribute that is not defined in the schema
  • Entries that are missing a required attribute
  • Entries that include an attribute that is not allowed
  • Entries with multiple values for a single-valued attribute
  • Entries with an attribute value that violates the associated attribute syntax
  • Entries with an RDN attribute that is not defined in the schema
  • Entries with an RDN attribute that is not allowed to be included in that entry
  • Entries with an RDN attribute that is not allowed by the associated name form
  • Entries with an RDN that does not include an attribute required by the associated name form

 

This tool isn't comprehensive, and there are certain kinds of problems that it can't catch. For example, because it only looks at entries in isolation, it doesn't have the ability to check for entries with missing parents, or subordinate entries that violate a DIT structure rule.

 

To run the tool, specify the path to the LDIF file to be validated, and provide either the path to the server's schema directory or the arguments needed to connect to the server over LDAP. You may also want to use the "--rejectFile" argument to specify that entries that violate schema should be written to a file with information about the problem, and the "--numThreads" argument to specify that the tool should use multiple threads to perform the processing in parallel across multiple CPUs. For example:

 

$ bin/validate-ldif --ldifFile mydata.ldif \
      --schemaDirectory config/schema \
      --rejectFile mydata-rejects.ldif \
      --numThreads 8

 

The tool also offers a number of options that can be used to customize the type of validation to perform. This can be helpful if you're aware of certain problems with the data and don't want them flagged, but you'd still like to perform validation to identify other problems. Run the tool with "--help" to see the list of available arguments, but the following are among the most likely problems you may want to overlook:

 

  • --ignoreMissingSuperiorObjectClasses — Indicates that the tool should not flag entries that include subordinate object classes without also including all of their superior classes. When performing an LDIF import, the server will automatically add any missing superior object classes, so this won't cause an error during the import, and some directory servers don't automatically include superior object classes in entries, so it is not uncommon for data migrated from another server to be missing these classes.

 

  • --ignoreMissingRDNValues — Indicates that the tool should not flag entries that include attribute values in their RDN that are not present in the set of attributes for the entry. The set of attributes for an entry must include the attribute values used in the DN, but RFC 4511 section 4.7 indicates that clients don't necessarily need to include RDN values in the set of entry attributes when adding entries, and the server will automatically add any missing RDN values to the set of entry attributes.

 

  • --ignoreStructuralObjectClasses — Indicates that the tool should not flag entries that do not include exactly one structural object class. Although LDAP standards dictate that each entry must have exactly one structural class, some directory servers do not enforce this constraint, so it is not uncommon for migrated data to include entries that either don't have any structural object class, or that have multiple structural classes. Although the best way to address this issue when migrating data is to fix the data so that every entry does include exactly one structural class or to alter the schema to eliminate the violations, the server does offer the ability to relax this validation if necessary.

 

  • --ignoreSyntaxViolationsForAttribute — Specifies the name or OID of an attribute type for which the tool should ignore attribute syntax violations. Some LDAP servers do not require attribute values to conform to the associated attribute syntax, so it is not uncommon for migrated data to include values that violate this syntax. As with violations of the single structural object class policy, the best way to address this problem is to fix the data or alter the schema, but the server can be configured to ignore syntax violations for a specified set of attributes. This argument can be provided multiple times to indicate that syntax violations should be ignored for multiple attributes, and that is generally preferred over the --ignoreAttributeSyntax argument that disables syntax validation for all attribute types.

 

Dealing with Data That Violates the Schema

There can be any number of reasons that your data may not strictly comply with the defined schema, but it usually boils down to some combination of the server not enforcing all of the appropriate constraints (whether it simply doesn't check everything that it's supposed to, or it's been configured to relax some of that checking) and clients trying to store data that violates those constraints.

 

As indicated above, the most common types of violations encountered when migrating data, especially from servers that are particularly lax in their enforcement of schema constraints, are entries that do not have exactly one structural object class, and attribute values that violate the associated attribute syntax. Recommendations for dealing with these violations are described below.

 

Dealing with Structural Object Class Violations

Some directory servers don't enforce the LDAP requirement that each entry have exactly one structural object class, and many application developers also seem to be unaware of this constraint. As such, when migrating from an LDAP server that doesn't enforce this requirement, it is common to encounter entries that either have multiple structural object classes, or that don't have a structural object class at all. The validate-ldif tool can detect these entries before attempting the import so that you can identify and fix any such problems in the data.

 

The best way to address these problems is to update the data and/or alter the schema so that each entry ends up with exactly one structural object class. There are several ways that this can be done. If you have entries with multiple structural object classes:

 

  • Consider altering the schema so that one of those structural classes is a subclass of the other. This doesn't require any change to entries or to applications that create those entries, but you should only consider it if you can do so with only changes to your custom schema. You should not do this if it would require altering any of the schema definitions shipped with the server.

 

  • Consider altering the schema to redefine all but one of those classes as auxiliary. An entry can only have one structural object class, but can have any number of auxiliary classes. This also doesn't require any changes to entries or applications. Again, you should only do this if it doesn't require altering any of the schema definitions shipped with the server.

 

  • Consider defining a custom structural object class that contains all of the required and optional attributes of all of the structural object classes that are currently in use. You can then update the entries to use this new object class instead of the existing structural classes. This may also require updating applications that create entries to use the new object class instead of the existing set of structural classes.

 

If you have entries without any structural object classes:

 

  • Consider altering the schema to redefine a custom auxiliary class so that it is structural. This doesn't require any change to entries or to applications that create entries, but you should only consider it if you can do so without altering any of the schema shipped with the server.

 

  • Consider updating the entries to add a structural class to them. If an offending entry already includes the cn attribute, then you could use either the namedObject or untypedObject object class (both of which are included with the schema shipped with the server) for this purpose. You could also use the untypedObject object class if the entry contains any of the following additional attributes: c, dc, l, o, ou, st, street, uid, description, owner, or seeAlso. If neither of these is appropriate, then you could define your own custom structural class for this purpose that depends only on the objectClass attribute. Any of these changes may also require updating applications that create entries to include the new structural class.

 

If you are unwilling or unable to alter the schema or the data, then you can configure the server so that it does not enforce the requirement that all entries have exactly one structural class. This is controlled by the single-structural-objectclass-behavior property of the global configuration, and it can have one of the following values:

 

  • reject — The server will not reject attempts to create or update an entry so that it has multiple structural classes or no structural class. This is the default setting.

 

  • accept — The server will silently accept attempts to create or update an entry so that it has multiple structural classes or no structural class.

 

  • warn — The server will accept attempts to create or update an entry so that it has multiple structural classes or no structural class, but will log an error message for each operation that violates this constraint. Note that this log message will have a "mild error" severity, so it will not appear in the error log by default unless you adjust the default severity.

 

For example, to update the server to silently accept entries that do not have exactly one structural class, apply the following configuration change:

 

dsconfig set-global-configuration-prop \
    --set single-structural-objectclass-behavior:accept

 

Dealing with Attribute Syntax Violations

The options for migrating data containing attribute syntax violations are similar to the options for migrating data that violates the single structural object class constraint. The recommended approach is one of the following:

 

  • If the associated attribute type has an appropriate syntax, but the data includes values that violate that syntax, then the best thing to do is to identify the entries with malformed values and attempt to replace them with appropriate values. It may also be necessary to update applications that may write to that attribute to ensure that they don't attempt to store malformed values.

 

  • If the associated attribute type has a syntax that is too restrictive for the values you want to store, and if that attribute type is a custom attribute type not included in the schema included with the server by default, then you can update the definition for that attribute type to use a less restrictive syntax. Changing the syntax for attribute type definitions that are shipped with the server is not recommended.

 

  • If the associated attribute type has a syntax that is too restrictive for the values you want to store, but that attribute type is included in the default schema shipped with the server, then you can define a new attribute type to use in place of the attribute type provided with the server. You can use whatever syntax you want for this new attribute type, although you will also need to update an existing custom object class (or add a new custom class) so that the new attribute type is allowed. You will also need to update any applications that interact with those values so that they use the new attribute type instead.

 

If it is not possible to change the schema or the data to correct these problems, then the server will allow you to relax validation constraints for a specified set of attribute types. This can be accomplished through the permit-syntax-violations-for-attribute global configuration property. For example, if you wanted to configure the server to allow the seeAlso attribute to have values that are not valid DNs, you could apply the following configuration change:

 

dsconfig set-global-configuration-prop \
    --add permit-syntax-violations-for-attribute:seeAlso

 

While the global configuration also includes an invalid-attribute-syntax-behavior property that allows syntax validation to be disabled for all attribute types, it is recommended that the permit-syntax-violations-for-attribute property be used instead to control it on a per-attribute basis.

 

Dealing with Unsupported Operational Attributes

Even if your data perfectly conforms to the associated schema, you may still run into problems when attempting to migrate it into an UnboundID server because of differences in operational attributes used to keep state information for the user's entry or otherwise control how the server should interact with that entry.

 

The operational attributes most likely to cause these kinds of problems are those used to hold password policy state information, since there is no official standard that governs this, and different servers have different password policy capabilities. This section will describe the operational attributes that UnboundID servers use to hold password policy state information so that you can be better determine how to migrate password policy state information from your existing server into a form that can be used by the UnboundID server.

 

Key operational attributes used for maintaining password policy state include:

 

  • ds-pwp-password-policy-dn — This attribute is used to specify the DN of the password policy that governs the user. It may refer to an entry with the ds-cfg-password-policy object class in the configuration or in user-supplied data. If the value refers to an entry that does not exist, then that user will not be able to authenticate. If this attribute is missing from a user's entry, then the user will be governed by the server's default password policy, as specified by the default-password-policy property in the global configuration.

 

  • pwdChangedTime — This attribute is used to specify the time, in generalized time format, that the user's password was last changed. It is used for a number of purposes, including to determine when a user's password will expire (by computing the value from the password changed time and the max-password-age value from the user's password policy), to determine how long a user has to change their password after an administrative reset (by computing the value from the password changed time and the max-password-reset-age value from the user's password policy), and to determine the earliest time the user will be permitted to change their password again after a previous self-change (by computing the value from the password change time and the min-password-age value from the user's password policy). If a user's entry does not include a pwdChangedTime value, the server will use the value of the createTimestamp attribute instead. If a user's entry does not contain either the pwdChangedTime or the createTimestamp attribute, then the server will assume that the user's password has never been changed.

 

  • pwdReset — This attribute is used to indicate whether a user is required to change their password after an administrative reset. If this attribute exists with a value of true, then the user will be allowed to authenticate (unless a configured maximum password reset age has elapsed), but will not be permitted to request any other operations until they change the password.

 

  • ds-pwp-warned-time — This attribute is used to indicate when the user received the first warning about an upcoming password expiration. If the user's password policy is configured with expire-passwords-without-warning set to true, then the user's password will not actually expire until the user has received at least one warning, and then the actual expiration time will be computed from the values of the ds-pwp-warned-time operational attribute and the password-expiration-warning-interval from the user's password policy.

  • ds-pwp-auth-failure — This attribute is used to maintain information about any failed authentication attempts since the user's last successful bind. Values for this attribute can take two basic forms:
    • A timestamp in generalized time format, indicating the time of the failed authentication attempt.
    • A timestamp in generalized time format indicating when the failed attempt occurred, followed by an octothorpe (#) character and an encoded representation of the password used for the failed authentication attempt. This format allows the server to ignore subsequent failed attempts with the same wrong password as a previous failed attempt.

 

  • pwdAccountLockedTime — This attribute is used to hold a timestamp, in generalized time format, of the time that the user's account was locked as a result of too many failed authentication attempts. If the user's password policy has a nonzero lockout-duration value, then the pwdAccountLockedTime value will be used to determine when account lockout will end.

 

  • pwdHistory — This attribute is used to hold a list of the user's previous passwords, to prevent them from changing their password to a value that they had already used at some point in the recent past. The server writes values in the format {timestamp}#{syntaxOID}#{encodedPassword}, where {timestamp} is a timestamp in generalized time format indicating when the password was last used, {syntaxOID} is the OID of the attribute syntax used for the password (and it should be either 1.3.6.1.4.1.30221.1.3.1 for userPassword values and 1.3.6.1.4.1.4203.1.1.2 for authPassword values), and {encodedPassword} is the encoded representation of the password. However, for compatibility purposes, the server will also support values formatted as {timestamp}{encodedPassword}.

 

  • ds-pwp-account-disabled — This attribute is used to indicate whether the user's account has been disabled by an administrator. The account will be disabled if this attribute exists in the entry with a value of true.

 

  • ds-pwp-account-activation-time — This attribute may hold a timestamp, in generalized time format, indicating the time that the user's account will become active. If this attribute exists in a user entry, the user will not be allowed to authenticate until this time has passed.

 

  • ds-pwp-account-expiration-time — This attribute may hold a timestamp, in generalized time format, indicating the time that the user's account will expire. If this attribute exists in a user entry, the user will not be allowed to authenticate once this time has passed.

 

Some of these attributes, like ds-pwp-password-policy-dn, ds-pwp-account-disabled, ds-pwp-account-activation-time, and ds-pwp-account-expiration-time, can be set by an administrator. But most of these attributes are defined in the schema with the NO-USER-MODIFICATION constraint, which means that clients cannot directly manipulate their values with LDAP add or modify operations, but they can be provided in an LDIF file used with the import-ldif tool. If it is necessary to manipulate these values over LDAP, that can be accomplished with the password policy state extended request, or the manage-account command-line tool.