Although the Nexus1000v has been with us for a couple of years now
it’s still a bit of a mystery as to what it is, why it was create and how it
works or gets deployed. So I thought I’d take the time to give you all my
explanation and thoughts about the product.
So WHAT is it?
The Nexus 1000v is a virtual switch, meaning it is able to provide
Network communication like a switch but run as at the virtual machine level and
is completely independent of a hardware device, so a switch without a switch!
It works in conjunction with VMware vSphere 4.0 (upwards) and has already been
ratified to work with Hyper-V 3.0 when it launches. So for anyone who is
familiar with VMware, they should know that in order to provide switching for
Virtual Machines each vSphere host will run a local vSwitch within its
hypervisor in order to distribute traffic from multiple VMs out through the
shared Network cards. The Nexus 1000v forms part of Cisco’s Data Center class
Nexus series of switches and runs the NX-OS operating system.
But WHY did Cisco make it?
Well IT departments can be a little “Siloed” with different teams
of people responsible for different areas of the infrastructure. The Networking
team manage the Network, the Server team manage the Servers and the VMware team
manage the virtual environment….simple yes?
Unfortunately not….. and I know from experience. When a new Blade
chassis arrives at a customer site the Server team or a Server engineer will
start the installation, when he gets to the switches in the back of the Chassis
he asks the Network Engineer to help with the configuration, the network
engineer simply states “I’m a networking guy, we don’t configure the Chassis
switches, that’s your job”. So the Server guy clicks his way through the
options until he’s happy that some communication is working. He then passes the
new Chassis over to the VMware engineer who installs ESXi and starts setting up
the basic config, when he gets to the local vSwitch configuration, he then asks
the networking engineer to help with the configuration and I’m sure you can
guess the response (and I am one of the few people who can get away with
saying this, as many of you will know me as a Server guy, and VMware guy and a
Network guy). So what were left with is three sections of switching all
configured and managed by three separate teams we limited visibility as to the
whole environment. Cisco have never been too happy about this, mainly because
as soon as someone has a performance issue with their application, computer or
Server the first thing to be blamed is the Network and as it’s split up into
all these different section, then it’s very hard to state categorically that
it’s not the network without a few days investigation, by which time the issue
has magically disappeared and normal service is resumed.
So Cisco and the Networking team want to take back control of all
areas of switching and have the visibility of all traffic on their network to
reduce the chances of misconfiguration and simplify troubleshooting. At the
same time VMware were aware that the major network vendors weren’t happy with
the functionality of their vSwitch, so created an open API for their DVS and
invited all the vendors to see if they could do better. Cisco were first to
respond to the invite with the Nexus 1000v and I have to say they have done a
pretty good job with it.
So HOW does it work?
Well first off you need to make sure you can deploy it. The Nexus
1000v utilises the VMware Distributed Switch technology to function and as such
requires you to have VMware Enterprise Plus licensing. The Nexus 1000v can then
be purchased in line with the CPU model VMware use. It can be a little
expensive if purchase on its own, but Cisco do offer it for free if purchase
alongside Cisco UCS. I think because of this, people seem to think the 1000v
can only be used with UCS, but this isn’t the case. The 1000v only requires
vSphere (or Hyper-V 3.0) and can be deployed on any hardware platform. It is
made up of two parts, the Virtual Supervisor Module (VSM) and the Virtual
Ethernet Module (VEM). The VSM is the control plane for the 1000v and
integrates with multiple VEM’s which are embedded into each vSphere host. The
VSM runs as a Virtual Machine and can be deployed via an OVA template, the VSM
can be made redundant across a pair of VSM virtual machine configured as a
cluster. The VSM also benefits from features such as HA and vMotion within the
virtual environment. Once the OVA is deployed, the configuration and management
of the 1000v is all done via the Command line (CLI), which is how networking
people like to interact with things. The CLI commands are NX-OS based and
simple to use. The setup wizard allows for the pairing with a vCenter Server,
which then presents itself as a DVS within vCenter. The Network team are then
able to create each of the port-groups required by the virtual environment and
apply standard networking polices right down to the virtual machine level. The
two key factors for this are Netflow and QoS, meaning that you can take the
existing Quality of Service settings running in the rest of the network and
even apply them to the traffic between two VMs on the same host!
Once the Configuration is done and all the Port-groups are
available in vCenter, the VMware team (while working with the network team this
time) can begin to migrate the Hosts and VM’s over to the 1000v. When a Host is
added to the 1000v, update manager is used to deploy the VEM component into the
hypervisor. The VEM itself handles all the switching independently of the VSM,
although all configuration is referenced through the VSM. When new Port-Groups
are created at the VSM level, the VEM in each connected host is updated. Once
the Port-Groups are created they become available for the VMware team to select
them from the drop down list of Port-Groups when configuring a Network Card for
the VM.
So in conclusion the Nexus 1000v passes control of the network
back to the network team and provides a variety of enhancements over the
traditional vSwitch.
The product can be a little tricky to install and falls in a grey
area between Virtualisation and Networking.
There is a lot of development by Cisco and other vendors in this
area at the moment and I aim to follow this article up with more about the
Virtual Security Gateway VSG and VM-FEX as an alternative way to control
Virtual Machine networking.
For anyone who’s interested in testing the Nexus 1000v out, it is
available for a free 60 Day trial and can be downloaded from either Cisco or
VMware’s websites.
There are some considerations and best practices for Nexus 1000v,
and below are my thoughts that may be of interest to the more technical people
reading this.
Considerations:
COPY RUN START!
The Nexus 1000v is a switch and needs to be treated as one. Anyone
deploying this for the first time, will see that as you enable a new
Port-Profile, the relevant Port-Group is created in the vCenter immediately.
However just because it’s written to vCenter doesn’t mean it’s written to the
Nexus 1000v. Always remember to Copy the running configuration to the start-up
configuration whenever you make a change or create a new Port-Profile.
SYSTEM VLANs
When setting up the “Ethernet” Port-Profile for the Uplink ports,
you are requested to specify the “System VLANs”. The instruction for the
deployment will tell you to make sure that the Packet, Control and Data vLANs
for the Nexus 1000v should be set to System vLANs. However the purpose of a
system vLAN is not always explained. A System vLAN is a vLAN that is able to
become active when an host server first powers on and activates the VEM,
without the VEM first communicating with the VSM. So whatever vLAN the VSM sits
on must be a System vLAN, but at the same time whatever vLAN the vCenter server
is on must also be set as a System vLAN, otherwise communication to the vCenter
server may not be established! Also if your using NFS or iSCSI to access
datastores it’s a must to make these vLANs System vLANs as well.
It is also essential that a System vLAN is
configured for any "vethernet" Port-Profile that is used by one of
these vmKernel ports or the Nexus 1000v and vCenter Virtual machines.
BACKDOOR
Always think about an alternative method for accessing the local
ESXi console in case you make a mistake during the installation of the Nexus
1000v and the migration of the hosts and VMkernel ports. UCS will provide KVM
Console access directly to the host, but other vendor hardware may need an ILO
or DRAC to be enabled first.
HOW MANY NICs
It used to be that VMware required around 6 to 8 NICs to provide
all the different types of communication and when Nexus 1000v was first
launched this design was the recommended way to go. Recently however things
have changed and with 10GB capabilities and several review of QoS designs the
new “Best Practice” from Cisco for the Nexus 1000v is to utilise only 2x 10GB
NICs and have the Nexus 1000v manage all traffic across all port-groups. In the
case of Cisco UCS, it is recommended that the two NIC’s created in the Service
Profile are NOT enabled for Hardware failover as the convergence of the Nexus
1000v will be quicker and more stable.
DON’T RUSH!
So you’re ready to migrate your Hosts and VM’s over to the Nexus
1000v and the wizard in vCenter will let you do every single host and VM all at
once, which is pretty cool……but don’t ever do that!!! The worst thing that can
happen when migrating hosts and VM’s is to lose communication mid-way through.
So in order to mitigate against this you should follow these steps:
1 Disable HA and DRS on the cluster before migration
2 Start with a single host and not the one hosting vCenter or the VSM
3 Move one of the two NICs first along with the VMkernel Ports for
Management
4 Test communication with each VMkernel Port
5 Move a single test VM to the Nexus 1000v
6 Test Communication
7 Move the second NIC in the first host and any other VM’s on the
host (then test)
8 Do steps 3 and 4 to a second host
9 Test vMotion between the first two hosts
10 Move
the second NIC in the second host along with it’s VM’s
11 Once your happy everything is working on
these two hosts, you can now move the rest of the Hosts, VMkernel ports and Virtual Machines.
12 Leave
vCenter till last.
And hopefully that covers everything!