Sách: VMware vCloud® Director™ Infrastructure Resiliency Case Study.pdf (Public and Pediatrics)

VMware vCloud Director Infrastructure Resiliency Case Study ® ™ VMware vSphere® 5.0, VMware® vCenter™ Site Recovery Manager™ 5.0 and VMware vCloud Director 1.5 TEC H N I C A L W H ITE PA P E R v 1 . 0 F e b r u ar y 2 0 1 2 VMware vCloud Director Infrastructure Resiliency Case Study Table of Contents Design Subject Matter Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Purpose and Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Target Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Interpreting This Document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Case Requirements and Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Infrastructure Logical Architectural Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Management Cluster Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Resource Cluster Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Creating a Disaster Recovery Solution for vCloud Director. . . . . . . . . . . . . . . . . . . . . . . . . . 8 Logical Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Failover Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 About the Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 TECH N I C AL WH ITE PAPE R / 2 VMware vCloud Director Infrastructure Resiliency Case Study Design Subject Matter Experts The following people provided key input into this design: Name T it l e Role Duncan Epping Principal Architect – Technical Marketing Author Chris Colotti Consulting Architect – Center of Excellence Contributor TECH N I C AL WH ITE PAPE R / 3 VMware vCloud Director Infrastructure Resiliency Case Study Purpose and Overview VMware vCloud® Director™ 1.5 (vCloud Director) gives enterprise organizations the ability to build secure private clouds that dramatically increase datacenter efficiency and business agility. Coupled with VMware vSphere® (vSphere), vCloud Director delivers cloud computing for existing datacenters by pooling vSphere virtual resources and delivering them to users as catalog-based services. vCloud Director helps build agile infrastructure-as-a-service (IaaS) cloud environments that greatly accelerate the time to market for applications and increase the responsiveness of IT organizations. Resiliency is a key aspect of any infrastructure—it is even more important in infrastructure-as-a-service solutions. This case study was developed to provide additional insight and information as to how to increase availability and recoverability of a vCloud Director–based infrastructure using VMware® vCenter™ Site Recovery Manager™ (SRM) as well as common disaster recovery methodologies and tools. SRM facilitates fast and reliable recovery and enables you to meet your recovery time objectives (RTOs) by automating the failover process of your vCloud Director management environment. Target Audience The target audience of this document is an individual with a technical background who will be designing, deploying or managing a vCloud Director infrastructure, including but not limited to technical consultants, infrastructure architects, IT managers, implementation engineers, partner engineers, sales engineers and customer staff. This solution brief is not intended to replace or override existing certified designs for vCloud Director. It instead is meant to supplement knowledge and provide additional information for implementing a disaster recovery strategy for vCloud Director infrastructures. vCloud Director infrastructure architectural guidance is provided through the VMware vCloud® Reference Architecture Toolkit. Interpreting This Document The overall structure of this design document is largely self-explanatory. However, throughout this document several key points will be highlighted to the user by means of the following label: • NOTE – A point of general importance or a further explanation of a particular section. This document captures a solution developed for a specific scenario and set of requirements. It is assumed that the reader is familiar with vCloud Director, VMware vCenter Server™, SRM and vSphere reference architectures, technology and terminology. TECH N I C AL WH ITE PAPE R / 4 VMware vCloud Director Infrastructure Resiliency Case Study Case Requirements and Assumptions Requirements are the key demands on the design. Sources include both business and technical representatives. ID R e q u ir e m e nt R101 Increasing availability of vCloud Director infrastructure R102 Failover management cluster workload to secondary site R103 Failover vCloud Director resource cluster workload to secondary site R104 Must be a fully supported solution Table 1. Customer Requirements The following list includes overall assumptions for this particular scenario and considerations to be made before utilizing information contained in this document: ID Ass u mpti o n A101 Stretched layer-2 network A102 Storage-based replication technology A103 Use of SRM to orchestrate management cluster disaster recovery Table 2. Assumptions TECH N I C AL WH ITE PAPE R / 5 VMware vCloud Director Infrastructure Resiliency Case Study Infrastructure Logical Architectural Overview vCloud Director infrastructures must be deployed according to the VMware vCloud Architecture Toolkit (vCAT) defined in version 2.0.1. The vCAT prescribes a scenario where the vCloud Director elements are explicitly separated into two groups, a management cluster and a resource cluster. • The management cluster contains the elements required to operate and manage the vCloud Director environment. This typically includes vCloud Director cells, vCenter Server(s) (used for resource clusters), VMware® vCenter™ Chargeback Manager™, VMware® vCenter™ Orchestrator™, VMware® vShield Manager™ and one or more database servers. • The resource cluster represents dedicated resources for end-user consumption. Each resource group consists of VMware® ESXi™ hosts managed by a vCenter Server and is under the control of vCloud Director, which can manage the resources of multiple clusters, resource pools and vCenter Servers. This separation is recommended primarily to facilitate quicker troubleshooting and problem resolution. Management components are strictly contained in a relatively small and manageable cluster. Running management components on a large cluster with mixed environments can be time consuming and might make it difficult to troubleshoot and manage such workloads. Separation of function also enables consistent and transparent management of infrastructure resources, critical for scaling vCloud Director environments. It increases flexibility because upgrades for management and resource clusters are not tied. And it prevents security attacks or intensive provisioning activities from affecting management component availability. Figure 1 depicts this scenario. Management Cluster RES VC VCD VCD ESXi MGMT VC DB Server ESXi ESXi VSM ESXi Resource Cluster VM VM ESXi VM VM VM VM ESXi ESXi VM VM ESXi Figure 1. VMware vCloud Infrastructure Logical Overview TECH N I C AL WH ITE PAPE R / 6 VMware vCloud Director Infrastructure Resiliency Case Study Management Cluster Overview The management cluster hosts all the necessary vCloud infrastructure components. In our scenario, it contains at a minimum the following virtual machines: • Two vCenter Server systems –– One vCenter Server instance for management (of the management cluster) –– One vCenter Server instance for cloud resources managing the resource cluster • One vCenter Server database • Two vCloud Director cells • One vCloud Director database • One vShield Manager The management vCenter Server is running as a virtual machine in the cluster it is managing. The black arrow in Figure 2 depicts this. Management Cluster RES VC VCD ESXi VCD ESXi MGMT VC DB Server ESXi VSM ESXi Figure 2. Management Cluster Overview Resource Cluster Overview A resource cluster is a set of resources dedicated to end-user workloads and managed by a single vCenter Server instance. vCloud Director manages all the resource clusters through the vCenter Server instances attached to it. All provisioning tasks are initiated through vCloud Director and are passed down to the appropriate vCenter Server instance residing in the management cluster. A resource cluster contains only virtual machines instantiated by vCloud Director; this includes the VMware® vShield Edge™ appliances for network services. Figure 3 depicts a resource cluster. The vCenter Server virtual machine that manages this environment is not depicted because it resides within the management cluster. Resource Cluster VM VM ESXi VM VM ESXi VM VM ESXi VM VM ESXi Figure 3. Resource Cluster Overview TECH N I C AL WH ITE PAPE R / 7 VMware vCloud Director Infrastructure Resiliency Case Study Creating a Disaster Recovery Solution for vCloud Director As of this writing, VMware vCenter Site Recovery Manager 5.0 (or prior) does not support the protection of vCloud Director workloads (resource clusters). To facilitate disaster recovery (DR) in a vCloud Director environment, the proposed solution uses standard disaster recovery concepts that leverage conventional replication technologies and vSphere features for vCloud Director workloads and orchestrate the failover of the vCloud management infrastructure (management cluster) using SRM. SRM currently does not support vCloud Director because it is designed specifically to control the destination vCenter Server to manage rapid failover of virtual machines to the recovery site. To facilitate controlling failover of one vSphere environment to another, SRM requires some vCenter Server objects—such as resource pools, folders, and port groups—to be precreated on the destination side. Because vCloud Director is designed to fully control a resource cluster, it maintains complete authority and surveillance over all objects created in the cluster. Attempting to use SRM to fail over a resource cluster would result in SRM’s precreating placeholder objects for resources at the primary site, and the objects’ being unknown to vCloud Director. In testing SRM and vCloud Director together, we have identified the following automatically created objects as posing a challenge when newly created: • Virtual machines • Resource pools • Folders • Port groups In addition to those that are automatically created, the following objects are used and referenced by vCloud Director: • Clusters • Datastores Both vCloud Director and vCenter Server heavily rely on management object reference identifiers (MoRef IDs) to correlate the objects between the two platforms. Any unplanned changes to these identifiers will result in loss of functionality, because vCloud Director will not be able to manage these objects. The screenshot in Figure 4 displays the use of a MoRef ID within the vCloud Director database. The use of SRM would result in a change of the MoRef ID on the vCenter Server layer, resulting in an incorrect reference in the vCloud Director database leaving the object (for instance, a virtual machine), which is unmanageable from a vCloud Director perspective. Figure 4. Example of a vCloud Director SQL Database Table Containing MoRef IDs Another common issue when discussing SRM and vCloud Director together is the resignaturing of VMware vSphere® VMFS volumes. In a standard SRM environment, this is a common DR best practice that is done when replication has been stopped and the “replicated” volume is presented to the hosts in the recovery site. It is done to ensure that the volume on the recovery side has a unique ID. This prevents a scenario where two volumes with the same unique ID (UUID) are presented to the same host, which potentially can lead to data corruption. When a datastore is resignatured, there is a requirement to reregister all virtual machines within vCenter Server. TECH N I C AL WH ITE PAPE R / 8 VMware vCloud Director Infrastructure Resiliency Case Study vCloud Director references these virtual machines by MoREF ID and cannot handle these changes. As such, avoiding resignaturing volumes is a requirement. The proposed solution will prevent changes to any of these objects. This simplifies the recovery of a vCloud Director infrastructure and increases management infrastructure resiliency. Logical Architecture Overview vCloud Director disaster recovery can be achieved through various scenarios and configurations. This case study focuses on a single scenario as a simple explanation of the concept, which can then easily be adapted and applied to other scenarios. This case study focuses on an “active/standby” DR approach in which hosts at the recovery site are not utilized under normal conditions. To ensure that all management components are restarted in the correct order and in the least amount of time, SRM is used to orchestrate the failover. This requires that each site contain a management vCenter Server and an SRM server, as depicted in Figure 5. Management Cluster A Protected Site VC VC SRM VCD VSM SRM VC Recovery Site DB Protected Active Management Cluster B Recovery Standby Figure 5. Management Cluster Overview TECH N I C AL WH ITE PAPE R / 9 VMware vCloud Director Infrastructure Resiliency Case Study Figure 6 depicts the full vCloud Director infrastructure architecture used for this case study. Management Cluster A Protected Site VC VC Management Cluster B SRM VCD VSM SRM VC Recovery Site DB protected recovery Active Standby vApp vApp vApp vApp vApp vApp ESXi ESXi ESXi ESXi ESXi ESXi ESXi ESXi (Hosts in maintenance mode) Active Standby Figure 6. Management Cluster Overview Both the protected site and the recovery site have a management cluster that contains a vCenter Server instance and an SRM server. These servers facilitate the DR procedures for the other components contained within the management cluster by including them in an SRM protection group. By creating an SRM recovery plan, the virtual machines within this protected group can be failed over to the recovery site. More details regarding the creation of recovery plans, protection groups and SRM configuration in general can be found at http://www.vmware.com/support/pubs/srm_pubs.html. Storage is replicated and not stretched in this environment. Hosts in the resource cluster at the recovery site are unable to detect the storage at the protected site, so they cannot run vCloud Director workloads in a normal situation. They are depicted as hosts that are placed in maintenance mode. They can also be standalone hosts added to the vCloud Director resource cluster during the failover. For simplification and visualization purposes, this scenario describes the situation where the hosts are part of the cluster and are placed in maintenance mode. Storage replication technology is used to replicate LUNs from the protected site to the recovery site. This can be done using asynchronous or synchronous replication. It typically depends on the recovery point objective (RPO) determined in the service-level agreement (SLA) and the distance between the protected site and the recovery site. In our case study, synchronous replication was used. SRM manages the LUNs/datastores on which the vCloud Director management infrastructure is hosted leveraging a storage replication adapter (SRA). Through the use of an SRA, it is possible to fully automate and orchestrate a failover for the vCloud Director management infrastructure. The LUNs/datastores on which the vCloud Director workloads are running are not managed by SRM because this is currently not supported. As a result, manual steps are possibly required during the failover. Depending on the type of storage used, these steps can be automated leveraging storage system API calls. TECH N I C AL WH ITE PAPE R / 1 0

Sách: VMware vCloud® Director™ Infrastructure Resiliency Case Study

Nội dung