Thursday, December 20, 2007

WINDOWS 2000 CLUSTERING AND LOAD BALANCING

When Windows 2000 was finally released to the public, I’d been running all beta and
Release Candidate (RC) versions into the ground. Early on, I realized a winner was
here. The system suddenly seemed less prone to the blue screen of death (BSOD) and
reliability could be obtained. Now, years later, and after a few service-pack releases for
quite a few bug fixes on clustering, this is still a force to be reckoned with. You should
know that Windows 2000 Server doesn’t contain the services to be clustered or load
balanced. To mimic the Windows NT 4.0 Enterprise Edition, Windows 2000 Server has
an “advanced” version, conveniently named Windows 2000 Advanced Server. This is
the product you can cluster and load balance with. To compete in the high-end server
arena, Microsoft also released a high-end version of Windows 2000 called Windows
2000 Datacenter Server, which allows not only clustering and load balancing, but also
more flexibility to do it with by allowing four clustered nodes, instead of the limit of
two with Advanced Server. Important design tips to remember are the following:
when clustering and load balancing with Windows 2000, Windows 2000 Server won’t
support clustering and load balancing unless Application Center 2000 is installed;
Windows 2000 Advanced Server will support a two node cluster and load balancing;
and Windows 2000 Datacenter Server will support a four-node cluster and load
balancing.
To understand Microsoft’s position on this service, you should know Microsoft
offers four types of clustering services. With Windows 2000, you have the Microsoft
Cluster Server (MSCS), network load balancing (NLB), component load balancing
(CLB), and a product called Application Center 2000. When you read about Application
Center 2000 in detail, you’ll realize it can help tie all the components together for you
under one management umbrella. The Windows 2000 Clustering Service is thoroughly
covered in Chapter 2 and an example of it can be seen in Figure 1-6. In the next chapter,
you go step-by-step through the configuration and implementation of Windows 2000
Advanced Server Clustering and load balancing.
Windows 2000 Clustering Services
Windows 2000 Clustering Services enable you to implement some of the solutions
mentioned thus far. You’ve learned about clustering and Windows 2000 has state-ofthe-
art clustering capability for your Enterprise solutions. Windows 2000 helps you
by offering some great services, such as failover, Active/Active clustering, and rolling
upgrades.
Failover and Failback Clustering
Failover is the act of another server in the cluster group taking over where the failed
server left off. An example of a failover system can be seen in Figure 1-7. If you have a
two-node cluster for file access and one fails, the service will failover to another server in
the cluster. Failback is the capability of the failed server to come back online and take the
load back from the node the original server failed over to. Again, this chapter simply laysthe groundwork for the other chapters because, as you get into the actual configuration
and testing of, say, SQL2000, you could find that failover and failback might not always
work. This is important to anyone who wants to run a SQLServer cluster.Stateless vs. Stateful Clustering
Windows 2000 clustering functions as stateful, which means the application state and
user state are managed during and through the failover. This is an important design
question to ask yourself in the early stages of planning the High Availability solution.
Do you want stateful failover? Most would answer “yes,” so application state isn’t lost.
That can be equated as “what you were doing?” in time of failure. A stateless solution
is one provided by network and component load balancing, where the state of the user
and application aren’t managed. An example of stateless versus stateful can be seen in
Figure 1-8. As you become more involved with Application Center 2000, the explanation
gets deeper.
Active/Passive
Active/Passive is defined as a cluster group where one server is handling the entire load
and, in case of failure and disaster, a Passive node is standing by waiting for failover
(as seen in Figure 1-9). This is commonly used, but most would argue that you’re still
wasting the resources of that server standing by. Wouldn’t it be helpful if they were
both somehow working to serve the clients needed data and still have the benefits of
failover? That’s what Windows 2000 clustering services can offer you: this is called
Active/Active clustering.Active/Active
Active/Active clustering is when you want all servers in the cluster group to service
clients and still be able to take up the load of a failed server in case of disaster, as seen
in Figure 1-10. That said, a downside exists to using this technology. In Active/Passive
clustering, you have a server producing 100 percent resources to clients. In case of
disaster, the failed server fails over to the standby passive server. That node picks up
the load and, other than a few seconds of change over time, there isn’t any difference to
the client. The client is still using 100 percent of the server’s resources. In Active/Active
clustering, this wouldn’t be the case. You have nodes in the cluster sharing the load,
thus, when one node fails and the other nodes must take up the load, this means you
lost some of that percentage. In other words, you have two nodes providing services to
the network clients. That’s 100 percent of served resources. If one server fails, then the
clients will only have one server in which to access and that would cut the percentage to
50 percent. This might not be noticeable in low-demand scenarios, but this is something
to think about when planning your overall design. The best way to go about this is to
determine the demand your servers will need and design your cluster solution around
that demand. You also need to think about future demand, which brings us back to
scalability. You learn about this in the section “Designing a Clustered Solution,” where
you can look at step-by-step design ideas you might need to consider.Rolling Upgrades
Rolling upgrades is a fantastic way to upgrade software on your production servers
one at a time, without having a full-blown outage. Rolling upgrades is used for many
reasons, including upgrading complete OSs, or applying service packs or hot fixes.
The cluster node that needs work can be brought offline for maintenance, and then
brought back online when the maintenance is complete, with no interruptions or only
minor disruptions of service. You learn about performing a rolling upgrade in Chapter 2
of this book.
Network Load Balancing
Windows 2000 allows for load balancing of services as well. As just discussed, in an
Active/Active cluster, you have load-balancing functionality. Another form of load
balancing exists, though, which is if you have one IP address for an entire load-balanced
cluster (with Windows 2000 Advanced Server, this scales to 32 nodes) and, using an
algorithm, each node in the cluster helps with the entire data-traffic load. You can also use
third-party solutions for load balancing in this manner, which you learn about shortly.
The way network load balancing (NLB) works is by having a driver sit between
the TCP/IP stack and your NIC card. This driver is installed when you apply the
service on every node in the cluster. All nodes participate by using one Internet
protocol (IP) address, which is called a virtual IP address (VIP). Only one node will
respond each time, but this will be a different node within the cluster. An affinityfeature is used to weight the balance of the load when you configure NLB with
Application Center 2000. (Application Center 2000, as you learn in Chapter 4, adds
to the native NLB service that Windows 2000 Advanced Server provides.) You have
multiple benefits for using Windows 2000 load-balanced solutions, which include,
of course, balancing the load, transparent convergence, adding and removing servers
as needed, and assigning certain servers in the load-balanced cluster certain amounts
of the overall load and multicast-based messaging between nodes. You can see an
example of a NLB solution in Figure 1-11.
Convergence
Windows 2000 has the intelligence to be able to know what nodes are in the cluster
and, if one of them fails, it can reconverge the cluster based on this new number of
nodes to continue balancing the load correctly. All Network Load Balancing (NLB)hosts exchange heartbeat messages to inform the default host they’re still active in the
cluster. When a host doesn’t send or respond to the heartbeat message, a process begins
called convergence. During convergence, hosts that are still active are determined, as
well as and whether they can accept loading. When a new host joins the cluster, it
sends out heartbeat messages, are also trigger convergence to occur. Once all cluster
hosts agree to the current status of the cluster, the loading is repartitioned and
convergence ends.
The way NLB tracks which node is the default node (the node with the highest
priority that keeps track of balancing the load to all other nodes in the group) and if
that node is affected, can reconverge the group to elect a new default node. You see
this in great detail while configuring load-balanced clusters and Application Center
2000-based clusters in Chapter 4.
Adding and Removing Servers
With Windows 2000 load balancing, you can easily add and remove nodes to the
cluster. Windows 2000 Advanced Server allows for up to 32 nodes, so you can start off
with 8 nodes and increase that number when necessary. When you configure Application
Center 2000, you’ll see this is an integral part of producing appropriate High-Availability
solutions. Your load won’t always be the same. Take, for instance, an ecommerce site
that sells gifts on the Internet. In December, around Christmas time, the amount of hits,
requests, and sales for the sight generally increases exponentially. That said, you’d want
to design your load-balanced solution to be able to function normally with eight servers
(you see how to baseline and monitor performance in Chapter 8), and then add servers to
the group when times of availability need to be increased. You’ll also want to be able
to remove these servers when you finish. The beauty of this solution is you can lease
server hardware when and where you need it, instead of keeping equipment you need
to account for on hand all year. What’s important to understand here is you’re allotted
that functionality, so you can plan for it because this chapter is where your initial
design work takes place. If you need four servers to begin with, you’ll have to baseline
the servers on hand, and then, during periods of high activity and use, baseline again.
You’ll find your load is either over what you expected and you’ll need to add a server
or you’ll find it’s under your expectations and you can survive the additional hits
with the hardware you have. Either way, you can only determine this by performance
monitoring the systems and knowing how many hits you get a month. All of this is
covered in the last chapter of the book.
Port Rules and Priority Assignments
The most difficult configurations on load-balanced solutions are Port Rules, affinity,
and weighted assignments. These take a little time to plan and a lot of reading to
understand fully if you aren’t familiar with them. The mission of this book is to
demystify these configurations for you, so you can plan, design, and implement them.
The load of every node in the load-balanced cluster can be customized with Port Rules,
which are used to specify load weight and priority. In Chapter 2, you learn about port
assignments and affinity when you configure NLB on Windows 2000 Advanced Server.

No comments: