In this article I will discuss the role of Microsoft Active Directory in a multi-site deployment using VMware Site Recovery Manager for disaster recovery services. This article will also explain when to transfer or seize Active Directory FSMO (old name) or Operations Masters (new name) roles in case of a disaster.
I assuming a streched layer 2 LAN multi site configuration with one active and one passive (failover) site. From an AD perspective you might want to create two sites in AD. For simplicity and because of the streched LAN configuration in this case I am assuming just one domain.
It is a best practice not to include Microsoft AD Domain Controllers in a VMware Site Recovery Manager (SRM) recovery plan. The reason for this is that there is a potential risk of data corruption (depending on the exact scenario). On top of this DC’s don’t have to be SRM protected because you can leverage availability features out of the box.
Note: I can think of a scenario in which you want to include AD, but this is out of scope in this article.
Depending on the exact SRM multi-site scenario you will have at least three or four Domain Controllers spread out across the two sites. These DC’s will spread the load and will also provide you with some HA features preventing a single point of failure in case a site is lost. It’s a good idea to have (AD integrated) DNS servers in both sides available as well; in case a site is lost you will have at least one DNS server available. Same counts for the DHCP service.
About Active Directory FSMO / Operations Masters Roles
Active Directory itself will keep the DC’s in sync. By default AD will also assign the so called Flexible Single Master Operation (FSMO) / Operations Masters roles to (only) one of the DC’s; most of the time the first domain controller in the forest root domain. The various FSMO/Operations Masters roles can be distributed over different domain controllers if preferred.
The 5 FSMO or Operations Masters roles are (from the Microsoft website):
- Schema master – The Schema master role is forest-wide and there is one for each forest. This role is required to extend the schema of an Active Directory forest or to run the adprep /domainprep command.
- Domain naming master – The Domain naming master role is forest-wide and there is one for each forest. This role is required to add or remove domains or application partitions to or from a forest.
- RID master – The RID master role is domain-wide and there is one for each domain. This role is required to allocate the RID pool so that new or existing domain controllers can create user accounts, computer accounts or security groups.
- PDC emulator – The PDC emulator role is domain-wide and there is one for each domain. This role is required for the domain controller that sends database updates to Windows NT backup domain controllers. The domain controller that owns this role is also targeted by certain administration tools and updates to user account and computer account passwords.
- Infrastructure master – The Infrastructure master role is domain-wide and there is one for each domain. This role is required for domain controllers to run the adprep /forestprep command successfully and to update SID attributes and distinguished name attributes for objects that are referenced across domains.
Although a domain will remain operational when the domain controller containing the FSMO/Operations Masters roles is not available, you might end up with some limited functionality.
What to do with the FSMO / Operations Masters roles in case of a disaster?
When a disaster occurs or is about to occur, you have to take care of the FSMO / Operations Masters roles; especially when the disaster strikes the data-center which contains the Domain Controller containing these roles.
We can think of one of the following scenario’s:
- Disaster strikes (e.g. a fire or flooding) and you don’t have any time to gracefully shutdown the original active datacenter containing the DC running the FSMO roles.
- A disaster is about to occur (e.g. there’s a power cut and you’re running on UPS/backup generator), but you still have time to gracefully shutdown the original data-center within a limited period of time.
Regarding the FSMO roles we can think of the actions: transferring FSMO roles or seizing FSMO roles.
Transferring FSMO is opportune in case both data-centers are still operational. You can connect the DC that is holding FSMO/Operations Masters role and move these roles to a DC in your failover datacenter.
- An administrator reassigns the role by using a GUI administrative tool.
- An administrator reassigns the role by using the ntdsutil /roles command.
- An administrator gracefully demotes a role-holding domain controller by using the Active Directory Installation Wizard. This wizard reassigns any locally-held roles to an existing domain controller in the forest. Demotions that are performed by using the dcpromo /forceremoval command leave FSMO roles in an invalid state until they are reassigned by an administrator.
After the transfer you can shutdown the original DC and start you SRM recovery plan (if it was not already running).
In case the primary data-center is not available anymore (and thus the DC holding FSMO roles is down/unreachable) you have to run the seize procedure. Don’t forget to read some important considerations on this at the end of this article.
Both the transfer and seize procedure are explained in this KB article by Microsoft. Although the procedure is listing MS Windows 2000 and 2003, the procedure is also applicable for Windows 2008 and 2012.
Important, do not reconnect and/or restore former FSMO/Operations Masters role holding domain controllers once seized: “A domain controller whose FSMO roles have been seized should not be permitted to communicate with existing domain controllers in the forest. In this scenario, you should either format the hard disk and reinstall the operating system on such domain controllers or forcibly demote such domain controllers on a private network and then remove their metadata on a surviving domain controller in the forest by using the ntdsutil /metadata cleanup command.
The risk of introducing a former FSMO role holder whose role has been seized into the forest is that the original role holder may continue to operate as before until it inbound-replicates knowledge of the role seizure. Known risks of two domain controllers owning the same FSMO roles include creating security principals that have overlapping RID pools, and other problems.”
To seize or not? That’s the question!
The choice to seize FSMO roles is depending on your expectations on the return (or not) of the original DC holding the FSMO roles. If you expect this original DC to become available in hours (or even days), you might want to leave the FSMO roles in the original location and run the domain without FSMO roles for a period of time….but be careful: Specifically the absence of the FSMO “PDC Emulator” role might become a hassle. The PDC Emulator is responsible for time keeping, something which is very important in virtualized environments.
I hope this article cleared some things up regarding AD and SRM in Disaster Recovery scenario’s. Thanks to my colleagues Geurt Dijker and Peter van der Meijden of PQR for their valuable input.
Great article. I have spent many an hour designing stretched VLAN’s for SRM. As a general rule we have Primary Subnet A and DR Subnet B.
The Primary Subnet A is stretched across sites so that when any VM’s power up they are on the same IP Address schema including default gateway. However we do not replicate any AD DC’s.
Instead in the DR Subnet we have a AD DC (non FSMO) but a Global Catalog which is live all of the time. We have found this is the most flexible model, as it gives you the most choices in the event of DR e.g you can login and seize FSMO roles if required, removing meta data.
The main thing that we always do to control everything is to disable the intersite link on the Layer2/3 switches.
Sounds interesting…regarding disabling the ISL…you’re probably pointing at split scenarios, which are not discussed in this article (can be another good post).
Interesting article. Do you think you could automate the reassignment of roles as part of the recovery plan?
Yes, I think you can…although I haven’t tried it. On this website:
you will find some powershell scripts which will reassign Operations Masters roles. SRM will let you start a batch script as part of a recovery plan…so just let this batch file start the powershell script and you’re done.
Curious. I am cloning 2 DC’s (root + child domain) to bring up in test network (large test). Domain has many child domains (global company). When the DC’s are brought on line in segmented DR network, they are extremely sluggish and sometimes unresponsive. Assuming this is in response to DC’s having no communication with outside DC’s. I need these to function for authentication and DNS in order to conduct a lengthy test process (3+ days). For the record, 2008 R2 DC’s. Thoughts? The DC process should be straightforward (seize FSMO, change IP to SR subnet, register DNS, etc).