VMware Cloud on AWS (VMC) is the natural extension of your on-premises vSphere based datacenter. Because VMC is running vSphere as well, a conversion of your VMs is not required. You can just run an off- or online (live) migration, and move your VMs to the cloud without any change. Moving back your VMs to on-premises is also not a challenge, you’re free to run your VMs wherever you want.
Next to vSphere, VMC also leverages vSAN for storage and NSX networking. VMC actually follows the VMware Cloud Foundation (VCF) architecture. VCF is a standardized architecture for your SDDC and can be deployed on-premises. With VMC you don’t have do any initial installation and configuration, because VMware will manage this for you. VMware will also take a care of patching and lifecycle management.
You can see VMC as “VCF-as-a-service”; a ready to run vSphere/vSAN/NSX environment which allows you to run VMs in the cloud using your existing skills and toolset.
VMC with NSX-T SDDC
So, enough background on VMC, let’s have a closer look at VMC networking. VMC can be backed by NSX-V or NSX-T. NSX-V is the original way of doing things, since a couple of months NSX-T is the standard. New SDDCs in VMC will be using NSX-T, existing SDDCs that are using NSX-V will be migrated to NSX-T:
We will migrate all existing NSX-V SDDC to NSX-T as part of the migration effort that will start in early 2019. You don’t have to do anything special. We will work with you during the migration (source).
In this article I will focus on NSX-T. NSX-T provides a broader feature set than NSX-V, including: route-based IPSEC VPN, AWS direct connect for all traffic types, distributed firewall, IPFIX/port mirroring and Management Appliance and ESXi access to and from the overlay networks and AWS VPC. This is on top of the features that are both supported by NSX-V and NSX-T, such as: policy-based IPSEC VPN, L2 VPN, Edge Firewall, support for logical network and much more. Check page 6 of this PDF.
The basic building blocks of VMC are detailed in the following picture:
On the left we have our own data center running on-premises, in the middle we have VMware Cloud on AWS that is running in an AWS datacenter of choice close to AWS native services that can be accessed directly (on the right). Connectivity is essential to make this all work. We need connectivity between the on-premises datacenter and VMC on the one hand, and connectivity between VMC and native AWS services on the other hand.
As stated before, VMC is heavily relying on NSX for networking. The VMs that are running in your datacenter will leverage logical networks that are actually Geneve backed L2 network segments. NSX-V used the VXLAN protocol for this, NSX-T uses Geneve – a further improved overlay protocol.
VMC Networking Architecture
Because NSX-T is used [in newer versions of VMC], VMC can now use the multi tier routing model of NSX-T. In this model we can distinct a provider router and one or more tenant routers. The concept of multi-tenancy is build straight into NSX-T; the top tier logical router is referred to as a Tier-0 (T0), while the bottom-tier logical router is Tier-1 (T1). Although it’s not mandatory to (always) use this model, in the case of VMC it’s very beneficial to separate the management networks from the compute (user/customer) networks.
In a VMC environment VMware is responsible for the vSphere/vSAN/NSX management components, and guess what…these components (including the ESXi hosts) are all connected to a management T1 router. The management T1 router is called Management Gateway (MGW). User/customer workloads are all connected to Compute Gateway (CGW) T1 router. These two T1 routers are both connected to the T0 router. Notice that “multi-tenancy” is currently limited to just one CGW and one MGW.
The following diagram depicts the different (networking) components that are used in VMC:
The different routers that are in this logical view are not actual service VMs as you would expect with NSX-V. In NSX-T we have the concept of transport- and edge nodes. A transport node is capable of participating in an NSX-T overlay. A transport node runs an NSX-T distributed virtual switch (called the N-VDS) and is capable of switching packets. An ESXi host that is participating in an NSX-T overlay is acting as a transport node and will have an N-VDS installed on the host. NSX-T Edge nodes are actually services appliances with pools of capacity dedicated to running network services that cannot be distributed to hypervisors, for example: connectivity to the physical network, NAT, DHCP Server, proxy services and edge firewall. When an edge is deployed, it’s initially an empty container. An Edge node can also act as a transport node, an N-VDS will be deployed to the Edge node.
The Distributed Router (DR) is key component that is responsible for the routing in NSX-T. DRs can be deployed to virtualization (transport) hosts and Edge nodes, and are logical components and specifically used for East/West network traffic. For non-distributed services (physical networking, NAT, DHCP etc.) a Services Router (SR) is required, an SR can only run on an Edge node/pool of Edge nodes.
For all these different components just one edge node (appliance) is required, this appliance is running in HA mode (which implies one extra appliance to provide high availability). The different services and router can run on the same appliance, although some components will be distributed across the connected ESXi hosts.
Let’s have another look at the routing and other components that are in the diagram:
- As we’ve learned the T0 router in the middle of this picture is the connecting piece of all the different elements in VMC. The T0 router is connected to the IGW, VGW, VPC router, CGW and MGW.
- The CGW is the Customer Gateway and a T1 router; this is the link between the customer workloads (VMs) and T0 router.
- The MGW is the Management Gateway and a T1 router; this is the link between the management workloads (vCenter, NSX Manager, ESXi servers) and the T0 router.
- The IGW is the Internet Gateway; this is the the link between the T0 router and the internet.
- The VGW is the Virtual Private Gateway, this is the link that is responsible for the VPN and/or Direct Connect connection;
- The VPC router is the link the customer’s Virtual Private Cloud in AWS;
VPN and Direct Connect
The IGW, VGW and VPC router are all on the edge of VMC and the connection to the outside world. The Direct Connect connection is configured in the AWS environment and the initial configuration is done in the AWS management console. AWS Direct Connect (DX) provides a dedicated high speed, low latency connection between your on-premises datacenter and AWS VPC/VMC. You can use DX alone or with a VPN. For DX you can both use a private- and/or public VIF (virtual interface). To connect from your on-premises datacenter to AWS public endpoints, such as EC2 or S3 use a public VIF. A private VIF is used to connect from your on-premises datacenter to your SDDC in VMC. After the initial DX configuration in AWS, you run through a few extra steps in VMC to complete the setup. More details are available here. A DX connection can be used for vMotion, ESXi management, Management Appliance and Workload Traffic.
When it comes to VPN, VMC supports route-based VPN and policy-based VPN. Both VPN types use IPSEC. With a route based VPN the SDDC routing table dictates traffic routes. This option provides resilient and secure access to multiple subnets. New networks are automatically discovered and propagated through the Border Gateway Protocol (BGP). You have to configure BGP at the local SDDC as well as in VMC to make this option work.
The policy based VPN option requires manual configuration steps, an administrator must update the routing table on both ends when new networks/routes are added. This option is beneficial if you have just a few networks that travel from/to VMC and/or you don’t have BGP available. Both L2 and L3 VPN are supported as depicted in the following diagram:
You can configure a VPN connection leveraging AWS Direct Connect, or just use the public internet for this. Notice that you will need Direct Connect for vMotion, although HCX will help with a live migration from on-premises to VMC without Direct Connect (more details on that in a future article).
Compute- and Management Gateway
The Compute- and Management Gateway, depicted as CGW and MGW in the diagram, are responsible for the routing of the user/customer and management network traffic.
The CGW includes network segments (logical networks) that are used for workload VM network traffic in your SDDC. Three types of logical networks are supported:
- A routed network that can connect to other logical networks in your SDDC and/or external network through a firewall.
- An extended network allows you to create a stretched L2 network segment across the SDDC in the cloud and your on-premises SDDC.
- A disconnected network is an internal only networking and cannot communicatie to the outside world.
For an extended network a L2VPN connection is used, you can extend up to 100 on-premises L2 networks to your cloud based SDDC. The CGW also provides DNS, DHCP and firewall configurations that manage network traffic for your workload VMs.
The MGW is responsible for management network traffic: vCenter, NSX and ESXi host network traffic. The initial configuration of the MGW is part of the SDDC automated deployment by VMware, however you need to configure your DNS servers, vCenter Server FQDN and set some MGW firewall rules. By default the MGW blocks traffic to all destinations from all sources.
So I hope this article helps you on understanding VMConAWS networking. If you want to learn more, there is a lot of interesting content available:
- VMware Cloud on AWS: Advanced Networking and Security with NSX-T SDDC by Humair Ahmed.
- VMware Cloud on AWS Networking & Security whitepaper.
- VMware Cloud on AWS with NSX: Use cases, design and implementation VMworld session.
- VMware NSX-T Reference Design whitepaper, not yet available for NSX-T 2.4 but still very valuable.
- Learn more about NSX-T from my fellow vExpert Ronald de Jong: more about the N-VDS, setting up Transport Nodes, routing configuration (also read part 2 and part 3) and configure load balancing.
Some of this sources were used to create this article.
Good to see more Client<-<Cloud Integrations instead of the 100% cloud push
Thanks for sharing.