Thursday, December 20, 2007

SERVER 2003 CLUSTERING AND LOAD BALANCING

With the upcoming release of Server 2003 on the horizon, now’s the time to start
thinking about using this platform for your Clustered solutions as well. Windows
2000 will be around for quite some time. Companies haven’t even moved away from
NT 4 yet and they have little to no intentions of doing so. Microsoft will also take a
stance at some time in the next decade and will look at what to do with Windows
2000 and its end of life (EOL) sequence. What’s next, you ask? A product called
Server 2003 will eventually replace Windows 2000. This book looks at clustering
and load-balancing Server 2003. One of the most confusing pieces of Microsoft’s new
naming convention is that it has also retired its Backoffice solution and upgraded
the name to Server 2003 Enterprise servers, (“Backoffice” is the name that applied to
running Exchange 5.5 or Proxy 2.0 on top of Windows NT 4.0). Windows 2000 also
has services that can be added to it, such as Exchange 2000 and Internet Security
and Acceleration (ISA) Server 2000, which are the subsequent upgrades from the
previously mentioned products.
Windows Server 2003 Enterprise Servers
The name Server 2003 can be confusing. I want to demystify this term, so you
understand how it will be referenced throughout the remainder of this book. You
have the OS, which is slated to succeed Windows 2000 Server and the Server 2003
Enterprise server line, such as SQL2000. And then you have the products just mentioned,
like Exchange 2000 and ISA 2000. My goal is to cover the configuration and installation
of clustered services that combine with most of these services. SQL 2000 is covered
in great detail because it’s a big player in N-tier architecture. You’ll most likely be
involved with N-tier architecture while configuring High Availability solutions.
Windows Server 2003
At press time, the full version of Windows Server 2003 wasn’t yet released and is
currently in RC2. It’s almost out of testing and ready for full production. After you
read this book, you’ll already know how to configure and cluster the full version of
Windows Server 2003. The program’s release should be in sync with this book’s release.
What I want to accomplish is to lay out the overall strategy and enhancements, so you
can consider this product in your upgrade or migration path for the future. Or, even
more important, you could find the product’s enhancements are so superior, you might
want to wait for its release to implement it immediately. Let’s look at where Server 2003
is going with clustering and load balancing.
Server 2003 Clustering Enhancements
First, your clustered node count went up. In Windows 2000 Advanced Server, you were
locked down to a two-node cluster, but Server 2003 Enterprise version will allow for
four-node clusters. (Datacenter Server moves up to eight nodes). Also new to Server 2003is load balancing for all its Server 2003 versions. Windows 2000 server was incapable of
NLB, but Windows Server 2003 is capable. Another huge addition is adding the Window
Cluster Service in Server 2003 to Active Directory. A virtual object is created, which
allows applications to use Kerberos authentication, as well as delegation. Unless you
have the hardware, it doesn’t matter. If you do have the hardware, though, 64-bit
support is now available. New configuration and management tools have been added,
which you read about in great detail in Chapter 3. They do make life easier. Network
enhancements have also been made to make network traffic run smoother so as to include
a multicast heartbeat default option where unicast traffic is only used if multicasting
fails entirely. You have options to make communication more secure as well. New
storage enhancements have also been worked into the product to allow more flexibility
with a shared quorum device. And, you have new cluster-based troubleshooting tools,
which you look at closely as an enhancement.
Server 2003 Load-Balanced Enhancements
A brand new management utility is being offered in Server 2003 load-balancing
services. You now have a central management utility from which to manage NLB
clusters. You see this in detail in Chapter 3 and make comparisons to Application
Center 2000, as necessary. You can now configure virtual clusters. This is a huge step
up because you previously had limitations on how you perform IP addressing on load
balanced clusters, but now you can configure clustering almost like switch-based
virtual local area networks (VLANs). You learn about this in Chapter 3. You also have
Internet Group Membership Protocol (IGMP) support, which is to have multicast
groupings configured for NLB clusters. Another greatly needed enhancement is the
inception of Bidirectional Affinity in what you need to implement to have server
publishing while using ISA Server 2000. Bidirectional Affinity is what is used to create
multiple instances of NLB on the same host to make sure that responses from servers
that are published via ISA Server can be routed through the correct ISA server in the
cluster. Two separate algorithms are used on both the internal and external interfaces
of the servers to aid in determining which node services the request.
As you can see, huge enhancements exist to the new Server 2003 technology, which
you learn about in great detail in Chapter 3 when we discuss load balancing and
clustering Windows Server 2003. You need to review the basics here so you can plan
for it, if necessary. All the major differences will be highlighted, as we configure
the clustered and load-balanced solutions. Chapter 3 covers the granular details of
configuration and implementation of Server 2003.
APPLICATION CENTER 2000
With the creation and shipment of Application Center 2000, Microsoft placed itself
on a map few others could reach. Application Center 2000 is the future of cluster
management. This Server 2003 enterprise server platform adds massive functionality
to your clustered and load-balanced solutions. You already know Windows 2000
Advanced Server can provide for you with load balancing and clustering, so now
you’ll learn about the benefits Application Center 2000 can add. Microsoft wanted toexpand on the NLB and clustering functionality of Windows 2000 Advanced Server
and it created the ultimate package to get that done. Microsoft Application Center 2000
is used to manage and administer from one central console web and COM+ components.
This was a problem in the past without Application Center 2000. Many customers
complained about how archaic it was to manage their clusters and load-balanced
solutions, so Microsoft obliged them with the Application Center 2000 Management
Console. Through this console, you can manage all your cluster nodes and all your
clusters in one Microsoft Management Console (MMC) snap-in. Health monitoring also
created a snafu, which was unmanageable. As you see in Chapter 8, you can monitor
the entire cluster from one console, instead of having to do performance monitoring
on every cluster node separately with Microsoft Health Monitor. You’ll also see that
configuring a cluster without Application Center 2000 can be difficult.
In the next few chapters, you learn to configure clustered and load-balanced
solutions, and then, in later chapters, you do the same thing using Application Center
2000. You’ll see clearly that the management of difficult settings becomes much easier
to configure and manage. Application Center 2000 also provides the power to manage
your web sites and COM components, all within the same console. This is important
because, many times, most of what you’ll be load balancing are your web site and
ecommerce solutions. You also have some other great add-ons, such as the capability
to use alerting, and so forth. Using Windows 2000 and Application Center 2000 to
manage your clusterComponent Load Balancing
In times of High Availability, you might not only need to cluster and load balance entire
server platforms, but also critical applications that use Component Object Model (COM)
services of COM and COM+ for short. Most high-availability demands come from the
need to produce services quickly and reliably, like application components for an online
store. You might need to load balance specific servers and pages, as well as the COM+
components shared by all servers within the group. With component load balancing
(CLB), the possibilities are endless. CLB is new to Windows 2000, once you install
Application Center 2000, and it offers something that wasn’t available in the past with
older versions of NT 4.0: the capability to scale up to 16 clustered nodes of servers
dedicated to processing the code for COM and COM+ objects. CLB clustering and
routing also needs Application Center 2000, which you use to implement this solution.
Chapters 4, 6, and 7 cover the granular details of configuration and implementation
of Application Center with Microsoft Servers.HIGHLY AVAILABLED ATABASES WITH
SQL SERVER 2000
SQL Server is by far the most up-and-coming database product today. With its lowerthan-
average cost against the bigger players like Oracle, SQL Server eats up more and
more market share as it continues to be promoted and moved into more infrastructures.
That said, more companies are relying on its uptime. For those who don’t know what
SQL Server is, it’s the Microsoft database server product. SQLServer 2000 (a Server 2003
Enterprise product) is mentioned here and is covered in depth throughout the book
because it’s an integral part of web-based commerce sites and it’s finding its way into
nearly every product available that does some form of logging or network management.
I think it’s clear why this product needs to be clustered and highly available. An
example of SQL Clustering can be seen in Figure 1-14. Chapter 5 covers the clustering
in granular detail. You also learn some little-known facts about what clustering this
product costs, how to convince management this product is relatively cheap to cluster,
and why clustering it makes sense.DESIGNING A HIGHLY AVAILABLE SOLUTION
Now that you know all the basics of High Availability, clustering, and load balancing,
you need to learn how to develop its design. This is, by far, the most important phase
in any project. Many networks have been built with good intentions but, because of the
lack of design done in the early stages of rolling out the solution, it always wound up
costing more, taking longer, or not panning out as expected.
In this book, I hope to get you to a point where you can completely bypass that
scenario. I want you to be the one who designs the proper solution and correctly budgets
for it in the early stages of development and project planning. First, you need to develop
what you’re trying to accomplish. This section gives you an overall approach to any
solution you need to accomplish. In other words, I won’t go into deep detail here about
Application Center per se, but you’ll get an overall thorough process to follow up until
you need to design the Application Center task within the project. When you get to the
appropriate chapters where each technology is different, I’ll include a design phase
section to help you incorporate that piece of technology into your overall design and
the project plan you want to create. For this section, you need to get that overall 40,000-foot
view of the entire project. This is critical because, without the proper vision, you might
overlook some glaring omissions in the beginning stages of the plan that could come
back to haunt you later.
To create a great solution, you first need to create a vision on what you want to
accomplish. If this is merely a two-node cluster, then you should take into account what
hardware solution you want to purchase. Getting involved with a good vendor is crucial
to the success of your overall design. You could find each vendor has different costs
that won’t meet your budget or each vendor might have clustering hardware packages
with shared storage solutions, which meet your needs more clearly than other hardware
vendors. For instance, you could find you’d like to have servers with three power
supplies instead of two within each server. You might decide you want your management
network connection to be connected via fiber or Gigabit Ethernet and have your shared
storage at the same speeds. You have much to think about at this stage of overall design.
Something else to think about is what service do you want to provide? You must
understand that the product you’re delivering needs to function properly and you
need to know what the client level of expectations is. You could have a client who
has a specific Service Level Agreement (SLA), which he expects you to honor. When
I shop for new services, I always want to know what’s in the contract based on my
own expectations. You might also want to get an overall feel of the expected deadlines.
By what date does this solution need to be rolled out live into production? This is
important to plan for because, based on what pieces of hardware you need to purchase,
you could have lead time on ordering it. Remember, if the hardware is sizable and
pricey, you might need to account for a little more time to get it.
Another consideration is budget. This is covered in its own section because budget
warrants its own area of discussion. You also need to consider the surrounding
infrastructure. I once encountered a design where the entire clustered solution was laid
out in Visio format and looked outstanding, but the planners didn’t account for the factthat they didn’t order the separate switch for the Management VLAN. Although this
was a painless oversight, my hope is this book can eliminate most of these types of
errors from ever occurring.
Creating a Project Plan
By creating a project plan like the one seen in Figure 1-15, you have a way to keep track
of your budget needs, your resources—whether the resources are actual workers or
technicians of server-based hardware—and many other aspects of rolling out a Highly
Available solution. Make no mistake, creating a Highly Availability solution is no small
task. There is much to account for and many things need to be addressed during every
step of the design during the setup and roll out of this type of solution. Having at least
a documented project plan can keep you organized and on track. You don’t necessarily
need a dedicated project manager (unless you feel the tasks are so numerous, and spread
over many locations and business units that it warrants the use of one), but you should
at least have a shared document for everyone in your team to monitor and sign off on.
Pilots and Prototypes
You need to set up a test bed to practice on. If you plan on rolling anything at all out
into your production network, you need to test it in an isolated environment first. To
do this you can set up a pilot. A pilot is simply a scaled-down version of the real solution,where you can quite easily get an overall feel of what you’ll be rolling out into your
live production network. A prototype is almost an exact duplicate set to the proper scale
of the actual solution you’ll be rolling out. This would be costly to implement, based
on the costs of the hardware but, if asked, at least you can accurately say you could set
up a pilot instead to simulate the environment you’ll be designing. Working with a
hardware vendor directly is helpful and, during the negotiation phase of the hardware,
ask the vendor what other companies have implemented their solutions. I can usually
get a list of companies using their products and make contacts within those companies,
so I can see their solutions in action. And I hit newsgroups and forums to deposit general
questions to see what answers I turn up on specific vendors and their solutions. You
could also find the vendors themselves might be willing to work out having you visiting
one of their clients to see the solutions in action. This has worked for me and I’m sure it
could also be helpful to you.
Designing a Clustered Solution
Now that you’ve seen the 40,000-foot view, let’s come down to 10,000 feet. Don’t worry.
In upcoming chapters (and starting with the next chapter), you get into specific
configurations. To understand all the new terminology, though, it’s imperative for you
to look at basic topology maps and ideas, so we can share this terminology as we cover
the actual solution configurations. As you look at clustering Windows 2000 Advanced
Server in the next chapter, we’ll be at ground level, looking at all the dialog boxes and
check boxes we’ll need to manipulate. First, you need to consider the design of a general
cluster, no matter how many nodes it will service. Let’s look at a two-node cluster for a
simple overview. Now let’s look at some analysis facts.
Addressing the Risks
When I mention this in meetings, I usually get a weird look. If we’re implementing
a cluster, is that what we’re using to eliminate the single point of failure that was the
original problem? Why would you now have to consider new risks? Although you
might think this type of a question is ridiculous, it isn’t. The answer to this question is
something that takes experience to answer. I’ve set up clustering only to find out that
the service running on each cluster was now redundant and much slower than it was
without the clustering. This is a risk. Your user community will, of course, make you
aware of the slow-down in services. They know because they deal with it all day.
Another risk is troubleshooting. Does your staff know how to troubleshoot and
solve cluster-based problems? I’ve seen problems where a clustered Exchange Server 2000
solution took 12 people to determine what the problem was because too many areas
of expertise were needed for just one problem. You needed someone who knew
network infrastructure to look through the routers and switches, you needed an e-mail
specialist, and you needed someone who knew clustering. That doesn’t include the
systems administrators for the Windows 2000 Advanced Servers that were implemented.
Training of personnel on new systems is critical to the system’s success . . .and yours.Have power concerns been addressed? I got to witness the most horrifying, yet hilarious,
phenomenon ever to occur in my experience as an IT professional. One of the junior
administrators on staff brought up a server to mark the beginning of the age of
Windows 2000 in our infrastructure, only to find out the power to that circuit was
already at its peak. The entire network went down—no joke. (Was that a sign or what?)
This was something I learned the hard way. Consider power and uninterruptible power
supplies as well. Power design is covered in more detail in Chapter 2.
Designing Applications and Proper Bandwidth
What will you be running on this cluster? This is going to bring you back to planning
your hardware solution appropriately. In each of the following chapters, you’ll be
given a set of basic requirements, which you’ll need to get your job done with the
solution you’re implementing. Of course, when you add services on top of the cluster
itself, you’ll also need to consider adding resources to the hardware.
You should also consider the bandwidth connections based on the application.
Bandwidth and application flows can be seen in Figure 1-16. Some services will use
more bandwidth than others and this must be planned by watching application flows.
In later chapters, we’ll discuss how to test your clustered solutions with a network and
protocol analyzer to make sure you’re operating at peak performance, instead of trying
to function on an oversaturated and overused network segment.
You also need to consider whether your applications are cluster aware, which means
they support the cluster API (application programming interface). Applications that
are cluster aware will be registered with the Cluster Service. Applications that are
noncluster aware can still be failed over, but will miss out on some of the benefits of
cluster-aware applications. That said, you might want to consider this if the whole
reason you’re clustering is for a mission-critical application that might not be cluster
aware. Most of Microsoft’s product line is cluster aware, but you might want to check
with a vendor of a third-party solution to see if their applications function with the
cluster API.
Determining Failover Policies
Failover will occur through disaster or testing and, when it does, what happens is
based on a policy. Until now, we’ve covered the fundamentals of what failover entails,
but now we can expound on the features a bit. You can set up polices for failover and
failback timing, as well as configuring a policy for preferred node. Failover, failback,
and preferred nodes are all based on setting up MSCS (Microsoft Cluster Service) or
simply the Cluster Service.
Failover Timing Failover timing is used for simple failover to another standby node in
the group upon failure. Another option is to have the Cluster Service make attempts
to restart the failed node before going to failover node to a Passive node. In situations
where you might want to have the primary node brought back online immediately, this
is the policy you can implement. Failover timing design is based on what is an acceptableamount of downtime any node can experience. If you’re looking at failover timing
based on critical systems, where nodes can’t be down at all, which is based on 99.999
percent, then you need to test your systems to make sure your failover timing is quick
enough, so your clients aren’t caused any disruption.
Failback Timing Failing back is the process of going back to the original primary node
that originally failed. Failback can be immediate or you can set a policy to allow timing
to be put in place to have the failback occur in off-hours, so the network isn’t disturbed
again with a changeover in the clustered nodes.Preferred Node A preferred node can be set via policy, so if that node is available, then
that will be the Active node. You’d want to design this so your primary node could
be set up with high hardware requirements. This is the node you’d want to serve the
clients at all times.
Selecting a Domain Model
I’ve been asked many times about clustering domain controllers and how this affects the
design. You can cluster your domain controllers (or member servers), but an important
design rule to consider is this: all nodes must be part of the same domain. A simple
design consideration is that you never install services like SQL on top of a domain
controller; otherwise, your hardware requirements will go sky high. When designing a
Windows 2000 clustered solution, you’ll want to separate services as much as possible.
Make sure when you cluster your domain controllers that you also take traffic overhead
into consideration. Now, you’ll not only have to worry about replication and
synchronization traffic, but also about management heartbeat traffic. Be cautious
about how you design your domain controllers and, when they’re clustered in future
chapters, I’ll point this out to you again.
Limitations of Clusters
But I thought clustering would be the total solution to my problems? Wrong! Clustering
works wonders, but it has limits. When designing the cluster, it’s imperative for you
to look at what you can and can’t do. Again, it all comes down to design. What if you
were considering using Encrypting File System (EFS) on your clustered data? Could
you set that up or would you need to forego that solution for the clustered one? This
question usually doesn’t come up when you’re thinking about clustering a service
because all you can think about are the benefits of clustering. You should highlight
what you might have to eliminate to support the clustered service. In the case of EFS,
you can’t use it on cluster storage. That said, you’ll also need to use disks on cluster
storage configured as basic disks. You can’t use dynamic disks and you must always
use NT file system (NTFS), so you won’t be able to use FAT or any of its variations.
You must also only use TCP/IP. Although in this day and age, this might not be
shocking to you, it could be a surprise to businesses that want to use Windows 2000
clustering while only running IPX/SPX in their environments. This is something you
should consider when you design your clustered solution.
Capacity Planning
Capacity planning involves memory, CPU utilization, and hard disk structure. After
you choose what kind of clustered model you want, you need to know how to equip
it. You already know you need to consider the hardware vendors, but when you’re
capacity planning, this is something that needs to be fully understood and designed
specifically for your system.

WINDOWS 2000 CLUSTERING AND LOAD BALANCING

When Windows 2000 was finally released to the public, I’d been running all beta and
Release Candidate (RC) versions into the ground. Early on, I realized a winner was
here. The system suddenly seemed less prone to the blue screen of death (BSOD) and
reliability could be obtained. Now, years later, and after a few service-pack releases for
quite a few bug fixes on clustering, this is still a force to be reckoned with. You should
know that Windows 2000 Server doesn’t contain the services to be clustered or load
balanced. To mimic the Windows NT 4.0 Enterprise Edition, Windows 2000 Server has
an “advanced” version, conveniently named Windows 2000 Advanced Server. This is
the product you can cluster and load balance with. To compete in the high-end server
arena, Microsoft also released a high-end version of Windows 2000 called Windows
2000 Datacenter Server, which allows not only clustering and load balancing, but also
more flexibility to do it with by allowing four clustered nodes, instead of the limit of
two with Advanced Server. Important design tips to remember are the following:
when clustering and load balancing with Windows 2000, Windows 2000 Server won’t
support clustering and load balancing unless Application Center 2000 is installed;
Windows 2000 Advanced Server will support a two node cluster and load balancing;
and Windows 2000 Datacenter Server will support a four-node cluster and load
balancing.
To understand Microsoft’s position on this service, you should know Microsoft
offers four types of clustering services. With Windows 2000, you have the Microsoft
Cluster Server (MSCS), network load balancing (NLB), component load balancing
(CLB), and a product called Application Center 2000. When you read about Application
Center 2000 in detail, you’ll realize it can help tie all the components together for you
under one management umbrella. The Windows 2000 Clustering Service is thoroughly
covered in Chapter 2 and an example of it can be seen in Figure 1-6. In the next chapter,
you go step-by-step through the configuration and implementation of Windows 2000
Advanced Server Clustering and load balancing.
Windows 2000 Clustering Services
Windows 2000 Clustering Services enable you to implement some of the solutions
mentioned thus far. You’ve learned about clustering and Windows 2000 has state-ofthe-
art clustering capability for your Enterprise solutions. Windows 2000 helps you
by offering some great services, such as failover, Active/Active clustering, and rolling
upgrades.
Failover and Failback Clustering
Failover is the act of another server in the cluster group taking over where the failed
server left off. An example of a failover system can be seen in Figure 1-7. If you have a
two-node cluster for file access and one fails, the service will failover to another server in
the cluster. Failback is the capability of the failed server to come back online and take the
load back from the node the original server failed over to. Again, this chapter simply laysthe groundwork for the other chapters because, as you get into the actual configuration
and testing of, say, SQL2000, you could find that failover and failback might not always
work. This is important to anyone who wants to run a SQLServer cluster.Stateless vs. Stateful Clustering
Windows 2000 clustering functions as stateful, which means the application state and
user state are managed during and through the failover. This is an important design
question to ask yourself in the early stages of planning the High Availability solution.
Do you want stateful failover? Most would answer “yes,” so application state isn’t lost.
That can be equated as “what you were doing?” in time of failure. A stateless solution
is one provided by network and component load balancing, where the state of the user
and application aren’t managed. An example of stateless versus stateful can be seen in
Figure 1-8. As you become more involved with Application Center 2000, the explanation
gets deeper.
Active/Passive
Active/Passive is defined as a cluster group where one server is handling the entire load
and, in case of failure and disaster, a Passive node is standing by waiting for failover
(as seen in Figure 1-9). This is commonly used, but most would argue that you’re still
wasting the resources of that server standing by. Wouldn’t it be helpful if they were
both somehow working to serve the clients needed data and still have the benefits of
failover? That’s what Windows 2000 clustering services can offer you: this is called
Active/Active clustering.Active/Active
Active/Active clustering is when you want all servers in the cluster group to service
clients and still be able to take up the load of a failed server in case of disaster, as seen
in Figure 1-10. That said, a downside exists to using this technology. In Active/Passive
clustering, you have a server producing 100 percent resources to clients. In case of
disaster, the failed server fails over to the standby passive server. That node picks up
the load and, other than a few seconds of change over time, there isn’t any difference to
the client. The client is still using 100 percent of the server’s resources. In Active/Active
clustering, this wouldn’t be the case. You have nodes in the cluster sharing the load,
thus, when one node fails and the other nodes must take up the load, this means you
lost some of that percentage. In other words, you have two nodes providing services to
the network clients. That’s 100 percent of served resources. If one server fails, then the
clients will only have one server in which to access and that would cut the percentage to
50 percent. This might not be noticeable in low-demand scenarios, but this is something
to think about when planning your overall design. The best way to go about this is to
determine the demand your servers will need and design your cluster solution around
that demand. You also need to think about future demand, which brings us back to
scalability. You learn about this in the section “Designing a Clustered Solution,” where
you can look at step-by-step design ideas you might need to consider.Rolling Upgrades
Rolling upgrades is a fantastic way to upgrade software on your production servers
one at a time, without having a full-blown outage. Rolling upgrades is used for many
reasons, including upgrading complete OSs, or applying service packs or hot fixes.
The cluster node that needs work can be brought offline for maintenance, and then
brought back online when the maintenance is complete, with no interruptions or only
minor disruptions of service. You learn about performing a rolling upgrade in Chapter 2
of this book.
Network Load Balancing
Windows 2000 allows for load balancing of services as well. As just discussed, in an
Active/Active cluster, you have load-balancing functionality. Another form of load
balancing exists, though, which is if you have one IP address for an entire load-balanced
cluster (with Windows 2000 Advanced Server, this scales to 32 nodes) and, using an
algorithm, each node in the cluster helps with the entire data-traffic load. You can also use
third-party solutions for load balancing in this manner, which you learn about shortly.
The way network load balancing (NLB) works is by having a driver sit between
the TCP/IP stack and your NIC card. This driver is installed when you apply the
service on every node in the cluster. All nodes participate by using one Internet
protocol (IP) address, which is called a virtual IP address (VIP). Only one node will
respond each time, but this will be a different node within the cluster. An affinityfeature is used to weight the balance of the load when you configure NLB with
Application Center 2000. (Application Center 2000, as you learn in Chapter 4, adds
to the native NLB service that Windows 2000 Advanced Server provides.) You have
multiple benefits for using Windows 2000 load-balanced solutions, which include,
of course, balancing the load, transparent convergence, adding and removing servers
as needed, and assigning certain servers in the load-balanced cluster certain amounts
of the overall load and multicast-based messaging between nodes. You can see an
example of a NLB solution in Figure 1-11.
Convergence
Windows 2000 has the intelligence to be able to know what nodes are in the cluster
and, if one of them fails, it can reconverge the cluster based on this new number of
nodes to continue balancing the load correctly. All Network Load Balancing (NLB)hosts exchange heartbeat messages to inform the default host they’re still active in the
cluster. When a host doesn’t send or respond to the heartbeat message, a process begins
called convergence. During convergence, hosts that are still active are determined, as
well as and whether they can accept loading. When a new host joins the cluster, it
sends out heartbeat messages, are also trigger convergence to occur. Once all cluster
hosts agree to the current status of the cluster, the loading is repartitioned and
convergence ends.
The way NLB tracks which node is the default node (the node with the highest
priority that keeps track of balancing the load to all other nodes in the group) and if
that node is affected, can reconverge the group to elect a new default node. You see
this in great detail while configuring load-balanced clusters and Application Center
2000-based clusters in Chapter 4.
Adding and Removing Servers
With Windows 2000 load balancing, you can easily add and remove nodes to the
cluster. Windows 2000 Advanced Server allows for up to 32 nodes, so you can start off
with 8 nodes and increase that number when necessary. When you configure Application
Center 2000, you’ll see this is an integral part of producing appropriate High-Availability
solutions. Your load won’t always be the same. Take, for instance, an ecommerce site
that sells gifts on the Internet. In December, around Christmas time, the amount of hits,
requests, and sales for the sight generally increases exponentially. That said, you’d want
to design your load-balanced solution to be able to function normally with eight servers
(you see how to baseline and monitor performance in Chapter 8), and then add servers to
the group when times of availability need to be increased. You’ll also want to be able
to remove these servers when you finish. The beauty of this solution is you can lease
server hardware when and where you need it, instead of keeping equipment you need
to account for on hand all year. What’s important to understand here is you’re allotted
that functionality, so you can plan for it because this chapter is where your initial
design work takes place. If you need four servers to begin with, you’ll have to baseline
the servers on hand, and then, during periods of high activity and use, baseline again.
You’ll find your load is either over what you expected and you’ll need to add a server
or you’ll find it’s under your expectations and you can survive the additional hits
with the hardware you have. Either way, you can only determine this by performance
monitoring the systems and knowing how many hits you get a month. All of this is
covered in the last chapter of the book.
Port Rules and Priority Assignments
The most difficult configurations on load-balanced solutions are Port Rules, affinity,
and weighted assignments. These take a little time to plan and a lot of reading to
understand fully if you aren’t familiar with them. The mission of this book is to
demystify these configurations for you, so you can plan, design, and implement them.
The load of every node in the load-balanced cluster can be customized with Port Rules,
which are used to specify load weight and priority. In Chapter 2, you learn about port
assignments and affinity when you configure NLB on Windows 2000 Advanced Server.

INTRODUCTION TO HIGH AVAILABILITY

High Availability
High Availability is the essence of mission-critical applications being provided quickly
and reliably to clients looking for your services. If a client can’t get to your services,
then they’re unavailable. Your company is making money to sustain the life of its
business, which depends on only one thing: your client base can shop online. Nerve
racking? You bet.
Not to sound overly simplistic, but systems up, servers serving, and the business
running is what High Availability is all about. Systems will fail, so how will your
company handle this failure? Anyone who has ever been in charge of a service that
needed to be up all the time and watched it crash knows how the company’s CEO or
vice presidents look at their angriest. High Availability, the industry term for systems
available 99.999 (called “Five Nines”) percent of the time, is the way around this. Five
Nines is the term for saying a service or system will be up almost 100 percent of the
time. To achieve this level of availability, you need to deploy systems that can survive
failure. The ways to perform this are through clustering and load balancing.
Throughout the book, you also learn about other forms of High Availability, such
as Redundant Array of Inexpensive Disks (RAID) and redundancy, in all aspects of
hardware and software components. You can see a simple example of a Highly Available
infrastructure in Figure 1-1. Although this book focuses on clustering and load-balancing
solutions, you’re given the big picture, so you can prepare almost all your components
for High Availability and redundancy.
Clustering and Load Balancing Defined
Clustering is a means of providing High Availability. Clustering is a group of machines
acting as a single entity to provide resources and services to the network. In time of
failure, a failover will occur to a system in that group that will maintain availability
of those resources to the network. You can be alerted to the failure, repair the system
failure, and bring the system back online to participate as a provider of services once
more. You learn about many forms of clustering in this chapter. Clustering can allow
for failover to other systems and it can also allow for load balancing between systems.
Load balancing is using a device, which can be a server or an appliance, to balance the
load of traffic across multiple servers waiting to receive that traffic. The device sends
incoming traffic based on an algorithm to the most underused machine or spreads the
traffic out evenly among all machines that are on at the time. A good example of using
this technology would be if you had a web site that received 2,000 hits per day. If, in
the months of November and December, your hit count tripled, you might be unable tosustain that type of increased load. Your customers might experience time outs, slow
response times, or worse, they might be unable to get to the site at all. With that picture
fresh in your mind, consider two servers providing the same web site. Now you have
an alternative to slow response time and, by adding a second or a third server, the
response time would improve for the customer. High Availability is provided because,
with this technology, you can always have your web site or services available to the
visiting Internet community. You have also systematically removed the single point
of failure from the equation. In Figure 1-2, you can see what a clustered solution can
provide you. A single point of failure is removed because you now have a form of
redundancy added in.Pros and Cons to Clustering and Load Balancing
You could now be asking yourself, which is better to implement, clustering or load
balancing? You can decide this for yourself after you finish this book, when you know
all the details necessary to implement either solution. To give you a quick rundown of
the high-level pros and cons to each technology, consider the following. With clustering,
you depend on the actual clustered nodes to make a decision about the state of the
network and what to do in a failure. If Node A in a cluster senses a problem with Node
B (Node B is down), then Node A comes online. This is done with heartbeat traffic,
which is a way for Node A to know that Node B is no longer available and it must
come online to take over the traffic. With load balancing, a single device (a network
client) sends traffic to any available node in the load-balanced group of nodes. Load
balancing uses heartbeat traffic as well but, in this case, when a node comes offline, the
“load” is recalculated among the remaining nodes in the group. Also, with clustering
(not load balancing), you’re normally tied down or restricted to a small number of
participating nodes. For example, if you want to implement a clustered solution with
Windows 2000 Advanced Server, you might use a two-node cluster. With load balancing,
you can implement up to 32 nodes and, if you use a third-party utility, you can scale
way beyond that number. You can even mix up the operating system (OS) platforms, if
needed, to include Sun Solaris or any other system you might be running your services
on. Again, this is something that’s thoroughly explained as you work your way through
the book. This section is simply used to give you an idea of your options. Finally,
you have the option to set up tiered access to services and to mix both architectures(clustering and load balancing) together. You can set up the first tier of access to your
web servers as load balanced and the last tier of access as your clustered SQLdatabases.
This is explained in more detail in the upcoming section on N-tier architecture, “ N-Tier
Designs.”
Hot Spare
A hot spare is a machine you can purchase and configure to be a mirror image of the
machine you want to replace if a failure occurs. Figure 1-3 shows an example of a hot
spare in use. A hot spare can be set aside for times of disaster, but it could sit there unused,
waiting for a failure. When the disaster occurs, the hot spare is brought online to participate
in the place of the systems that failed. This isn’t a good idea because the system sitting idle
isn’t being used and, in many IT shops, it will be “borrowed” for other things. This means
you never have that hot spare. For those administrators who could keep the hot spare as a
spare, you’re missing out on using that spare machine as a balancer of the load. Also, why
configure the hot spare in time of failure? Your clients lose connectivity and you have to
remove the old machine, and then replace it with the new one and have all your clients
reconnect to it. Or, worse yet, the angry client shopping online could be gone forever to
shop somewhere else online if it’s a web server hosting an ecommerce site. Setting up a
second server as a hot spare is redundant, but there is a better way. Set this second machine
up in a cluster. Although the hot spare method might seem a little prehistoric, it’s still
widely used in IT shops that can’t afford highly available systems, but still need some form
of backup solution.A Need for Redundancy
You already learned about some forms of redundancy in the first few portions on this
chapter in the discussion on clustering. Now let’s look at why redundancy of systems
is so important and what options you have besides a cluster. Being redundant (or
superfluous) is the term used to explain exceeding what’s necessary. If this is applied
to an IT infrastructure, then it would be easy to say that if you need a power supply to
power your server, then two power supplies would exceed what’s necessary. Of course,
in time of failure, you always wish you’d exceeded what you need, correct? The need
for redundancy is obvious if you want to have your business continue operations in
time of disaster.
The need for redundancy is apparent in a world of High Availability. Your options
today are overwhelming. You can get redundant “anything” in the marketplace. You
can purchase servers from Dell and Compaq with redundant power supplies: if one
fails, the other takes over. You have redundant power supplies in Cisco Catalyst switches,
for example. For a Catalyst 4006, you can put in up to three redundant power supplies.
This is quite the design you want when configuring your core network. A redundant
network can exceed hardware components and go into the logical configurations of
routes in your routers and wide area network (WAN) protocol technologies, such as
having your frame relay network drop off the face of the Earth and have your router
dial around it using ISDN. All in all, redundant services are key to a Highly Available
network design.
Manageability
With clustered solutions, you have the benefit of managing your systems as one
system. When you configure clustering with network load balancing (NLB) and with
Application Center 2000, you find that setting up and managing systems under one
console, and monitoring performance under one console, makes your life much easier.
Because we all know life as a Network and Systems administrator is far from easy, this
can be an incredible help to your efforts.
Reliability
Reliability is being able to guarantee you’ll have services available to requests from
clients. Think about it: you buy a brand new car—don’t you want it to be reliable?
The theory is the same when dealing with mission-critical network services. If server
components fail, you can plan outages that are usually at night and in off hours. What
if you run 24-hour-a-day operations? You want to be able to absorb the disaster that
occurs and reliably deliver the service you offer.
Scalability
Scalability is your option to grow above and beyond what you’ve implemented today.
For instance, say you purchased two servers to configure into a cluster with a separateshared storage device. If you want to say the solution you have is scalable, then you
would say you could add two more servers to that clustered group when the need for
growth arrived. Scalability (or being able to scale) is a term you would use to explain
that capability to grow either up or out of your current solution.
Scale Up
Scaling up is the term you use to build up a single machine. If you have one server—
and that server provides printing services to all the clients on your network—you
might want to increase its memory because, while performance monitoring the server,
you see that virtual memory is constantly paged from your hard disk. The fact that you
are “adding” to a single system to build it up and not adding more systems to share the
load means you are scaling up, as seen in Figure 1-4.
Scale Out
Scaling out is clustering as seen in Figure 1-5. You have one server providing a web site
to clients and, while performance monitoring, you notice page hits have increased by
50 percent in one month. You are exceeding limits on your current hardware, but you
don’t want to add more resources to this single machine. You decide to add another
machine and create a cluster.CLUSTERING WITH NT 4.0
Before you get into the high-level overview of clustering and load balancing with
Windows 2000 and the Server 2003 platforms, you should know where this all started.
I won’t go over the history of clustering and how Microsoft got involved, but I’ll give
you an overview on why Windows 2000 clustering is a worthy solution to implement
on your network.
Windows 2000 Clustering Services were first born on the Windows NT 4.0 Server
Enterprise Edition. On hearing of its arrival and implementing the services, those
involved quickly discovered this wasn’t something they wanted to implement on their
mission-critical applications. Microsoft Cluster Server, also code-named “Wolfpack,”
wasn’t reliable. A plethora of problems occurred while running the service, including
slow performance when using Fibre Channel and large amounts of hard disks that
stopped serving clients altogether for no apparent reason, only to discover later it was
another bug. This defeated the entire purpose for clustering in the first place and many
quickly lost faith in the solution Microsoft had provided. Faith wasn’t restored when
most of the fixes you could implement were supplied from Microsoft in the form of a
tool called: “Install the latest service pack.”
Fast-forward to Windows 2000 and you have a whole different solution, which you
discover throughout this book. All in all, the service has grown exponentially with the
newer releases of Windows server-based OSs, and has become a reliable and applicable
solution in your network infrastructure. If you plan to design an NT cluster, be aware
that NT Server 4.0 doesn’t support clustering, but it will work with load balancing.
Windows NT 4.0 Enterprise Edition will work with load balancing and can be clustered
with two nodes.