Failover is a critical process that can be complex and time-consuming. Manual failover is certainly a possibility, but often not practical with the number of steps involved (see below) and not fast enough to support HA requirements. Some type of clustering system is usually needed. OS clustering and application clustering are options. But these products don’t always provide an adequate solution, based upon typical needs of an HA database infrastructure. These include: support for most major applications, provisions for cross-platform, multi-tier dependency, automated disaster recovery, reduced system complexity and centralized management.

Oracle RAC is a standard HA solution for Oracle databases. It provides failover in seconds, multi-platform support and centralized management, but at a price that may not be appropriate for all HA applications. Many companies need another option, one that provides adequate failover performance (a few minutes, instead of seconds) with the required platform support and management functionality - but won’t break the bank.

Failover can be a complicated process and includes many steps that must be implemented in a specific sequence in order to be successful. As an illustration, typical failover steps included in the manual process are:

  1. BulletDetect failure

  2. BulletUnmount file system

  3. BulletDeport disk group

  4. BulletConfirm target (server membership calculation)

  5. BulletTransfer I/O control to target server

  6. BulletImport disk group

  7. BulletMount file system

  8. BulletStart application - recover redo logs

  9. BulletReconnect clients

An HA system should accomplish these steps in a timeframe that meets the required service levels. The HA system should also address the need for multi-tier application support since many enterprises run Oracle databases in a multi-application stack, often on different platforms, all of which must be included in the failover process. DR system support should also be included since there is often the desire to have a cluster failover to a remote location. Finally, given current IT workloads, care must be taken to reduce the potential complexity of clustered systems and keep administrative costs down.

Third-party clustering solutions like Veritas Clustered File System HA provide fast failover of Oracle database single instances for a reasonable cost. They automate the process to ensure integrity and eliminate some of the steps in the failover sequence to reduce downtime. Using a clustered volume manager allows disk groups containing Oracle data files to be shared and the clustered file system enables standby servers to have the file system already mounted. Every node in the cluster maintains membership and I/O control, eliminating more steps from the recovery sequence.

They also reduce costs by eliminating 1:1 failover topology. Traditional clustering architectures allocate one failover server for each production server. One alternative is to use an ‘n + 1’ configuration allowing a single ‘roaming spare’ server to be used as the failover server by all cluster nodes. Another option is an ‘n to n’ configuration, where each node can act as the failover node for any other node. The clustering application must manage the choice of which node is the target of a failover, based upon workload and resource availability, but this arrangement can also save costs over the 1:1 configuration. Finally, dynamic multi-pathing functionality should also be included, to provide SAN load balancing and facilitate path failover.

Oracle is often the back-end tier of an ‘n-tier’ application stack, which can include web servers and other front-end applications all hitting this high-transaction database. When these tiers are running on separate OS platforms the clustering solution should provide agents to preserve the dependencies between tiers and maintain application control. Without these agents, clustering software can only control applications running on similar platforms. The other servers must be restarted manually and the connections to the Oracle database reestablished. All these steps add time and complexity to the failover process, along with cost.

Another requirement of this clustered system is the reduction of cost and complexity by simplifying administration. It must support multiple OS platforms with a single clustering application and support multiple clusters locally and remotely from a single pane of glass. Simulation tools for configuration, testing and upgrades are also important.

While Oracle RAC does meet many of these requirements, it does so at a cost and complexity level that many organizations find unacceptable. RAC software licenses and support costs are higher than a standard Oracle database license. In order to be highly available, RAC also requires a native Dynamic Multi-Pathing license, at additional cost. Without n + 1 or n to n clustering topologies more servers are also required to provide failover hardware.

Making an Oracle database system highly available can be a complex job. Oracle’s RAC is a solution that provides automated failover of this system in seconds, but at a high price tag. Some more economic alternatives, like a manual failover process or OS-based clustering products, aren’t always adequate, either in the time they take to failover, their complexity or their ability to support the different platforms that are common to Oracle application stacks.

Third party clustering solutions, especially those that include a clustered volume manager and clustered file system, can be cost-effective alternatives. They can simplify the failover as well as support different operating systems and applications. Compared with Oracle RAC, they can provide a comprehensive system failover in a few minutes, at an affordable price tag. For multi-node environments that do use Oracle RAC for high availability, a third party volume management solution, like Veritas Storage Foundation for Oracle RAC, is still an option that can be used to share files between nodes and simplify storage management.

Eric Slack, Senior Analyst

Symantec is a client of Storage Switzerland