OraTechno

Monday, April 9, 2012

Installing Oracle Database 10g with Real Application Cluster (RAC) on Red Hat Enterprise Linux Advanced Server 3

Installing Oracle Database 10g with Real Application Cluster (RAC) on Red Hat Enterprise Linux Advanced Server 3

The following procedure is a step-by-step guide (Cookbook) with tips and information for installing Oracle Database 10g with Real Application Cluster (RAC) on Red Hat Enterprise Linux Advanced Server 3. The primary objective of this article is to demonstrate a quick installation of Oracle 10g with RAC on RH AS 3. This article covers Oracle Cluster File System (OCFS), Oracle's Automatic Storage Management (ASM), and FireWire-based Shared Storage. Note that OCFS is not required for 10g RAC. In fact, I never use OCFS for RAC systems. However, this article covers OCFS since some people want to know how to configure and use OCFS.

If you have never installed Oracle10g on Linux before, then I'd recommend that you first try to install an Oracle Database 10g on Linux by following my other guide Installing Oracle Database 10g on Red Hat Linux.

I welcome emails from any readers with comments, suggestions, or corrections. You can find my email address at the bottom of this website.

This article covers the following subjects and steps:

Introduction

* General
* Important Notes
* Oracle 10g RAC Setup
* Shared Disks Storage
General
FireWire-based Shared Storage for Linux

Pre-Installation Steps for All Clustered RAC Nodes

* Downloading Oracle 10g Software and Burning Oracle 10g CDs
* Installing Red Hat Advanced Server
Installing Software Packages (RPMs)
* Upgrading the Linux Kernel
General
Upgrading the Linux Kernel for FireWire Shared Disks Only
* Configuring the Network
General
Setting Up the /etc/hosts File
Configuring the Network Interfaces (NICs)
* Configuring Shared Storage Devices
General
Configuring FireWire-based Shared Storage
* Creating Oracle User Accounts
* Setting Oracle Environments
* Sizing Oracle Disk Space for Database Software
* Creating Oracle Directories
* Creating Partitions on Shared Storage Devices
General
Creating Partitions for OCFS
Creating Partitions for Raw Devices
* Installing and Configuring Oracle Cluster File Systems (OCFS)
Installing OCFS
Configuring and Loading OCFS
Creating OCFS File Systems
Mounting OCFS File Systems
Configuring the OCFS File Systems to Mount Automatically at Startup
Installing and Configuring Automatic Storage Management (ASM) and Disks
* Installing and Configuring Automatic Storage Management (ASM) and Disks
General
Installing ASM
Configuring and Loading ASM
Creating ASM Disks
* Configuring the "hangcheck-timer" Kernel Module
* Setting up RAC Nodes for Remote Access
* Checking Packages (RPMs)
* Adjusting Network Settings
* Sizing Swap Space
* Setting Shared Memory
* Checking /tmp Space
* Setting Semaphores
* Setting File Handles

Installing Cluster Ready Services (CRS)

* General
* Automating Authentication for oracle ssh Logins
* Checking OCFS and Oracle Environment Variables
Checking OCFSs
Checking Oracle Environment Variables
* Installing Oracle 10g Cluster Ready Services (CRS) R1 (10.1.0.2)

Installing Oracle Database 10g Software with Real Application Clusters (RAC)

* General
* Automating Authentication for oracle ssh Logins
* Checking Oracle Environment Variables
* Installing Oracle Database 10g Software R1 (10.1.0.2) with Real Application Clusters (RAC)

Installing Oracle Database 10g with Real Application Cluster (RAC)

* General
* Automating Authentication for oracle ssh Logins
* Setting Oracle Environment Variables
* Installing Oracle Database 10g with Real Application Cluster (RAC)

Post-Installation Steps

* Transparent Application Failover (TAF)
Introduction
Setup
Example of a Transparent Application Failover (TAF)
* Checking Automatic Storage Management (ASM)
* Oracle 10g RAC Issues, Problems and Errors
* References

Introduction
General
Oracle Real Application Cluster (RAC) is a cluster system at the application level. It uses shared disk architecture that provides scalability for all kind of applications. Applications without any modifications can use the RAC database.

Since the requests in a RAC cluster are spread evenly across the RAC instances, and since all instances access the same shared storage, addition of server(s) require no architecture changes etc. And a failure of a single RAC node results only in the loss of scalability and not in the loss of data since a single database image is utilized.

Important Notes
There are a few important notes that might be useful to know before installing Oracle 10g RAC:

(*) If you want to install Oracle 10g with RAC using FireWire-base shared storage, make sure to read first FireWire-based Shared Storage for Linux!

(*) See also Oracle 10g RAC Issues, Problems and Errors

Oracle 10g RAC Setup
This article covers the installation of Oracle 10g with RAC on three RHELAS 3 servers (including the use of FireWire-based shared storage):
RAC node Database Name Oracle SID $ORACLE_BASE Oracle Datafile Directory
--------------- ------------- ---------- --------------- ----------------------------------
rac1pub/rac1prv orcl orcl1 /u01/app/oracle Automatic Storage Management (ASM)
rac2pub/rac2prv orcl orcl2 /u01/app/oracle Automatic Storage Management (ASM)
rac3pub/rac3prv orcl orcl3 /u01/app/oracle Automatic Storage Management (ASM)

For this documentation I used Oracle Cluster File System (OCFS) for Oracle's Cluster Ready Services (CRS) since some people want to know how to configure and use OCFS. However, OCFS is not required for 10g RAC. In fact, I never use OCFS for RAC systems. CRS requires two files, the "Oracle Cluster Registry (OCR)" file and the "CRS Voting Disk" file, which must be shared accross all RAC nodes. You can also use raw devices for these files. Note, however, that you cannot use ASM for the CRS files. These CRS files need to be available for any RAC instance to run. And for ASM to become available, the ASM instance needs to run first.

For Oracle's data files, control files, etc. I used Oracle's Automatic Storage Management (ASM).

Shared Disks Storage
General

A requirement for Oracle Database 10g RAC cluster is a set of servers with shared disk access and interconnect connectivity. Since each instance in a RAC system must have access to the same database files, a shared storage is required that can be accessed from all RAC nodes concurrently.

The shared storage space can be used as raw devices, or by using a cluster file system or ASM. This article will address Oracle's Cluster File System OCFS and ASM. Note that Oracle 10g RAC provides it's own locking mechanisms and therefore it does not rely on other cluster software or on the operating system for handling locks.

FireWire-based Shared Storage for Linux

Shared Storage can be expensive. If you just want to check out the features of Oracle10g RAC without spending too much on cost, I'd recommend to buy an external FireWire-based shared Storage for Oracle10g RAC.
NOTE: You can download a kernel from Oracle for FireWire-based shared storage for Oracle10g RAC, but Oracle does not provide support if you have problems. It is intended for testing and demonstration only! See Setting Up Linux with FireWire-based Shared Storage for Oracle Database 10g RAC for more information.

NOTE: It is very important to get an external FireWire drive that allows concurrent access for more than one server! Otherwise the disk(s) and partitions can only be seen by one server at a time. Therefore, make sure the FireWire drive(s) have a chipset that supports concurrent access for at least two servers or more. If you have already a FireWire drive, you can check the maximum supported logins (concurrent access) by following the steps at Configuring FireWire-based Shared Storage.

For test purposes I used external 250 GB and 200 GB Maxtor hard drives which support a maximum of 3 concurrent logins. The technical specifications for these FireWire drives are:
- Vendor: Maxtor
- Model: OneTouch
- Mfg. Part No. or KIT No.: A01A200 or A01A250
- Capacity: 200 GB or 250 GB
- Cache Buffer: 8 MB
- Spin Rate: 7200 RPM
- "Combo" Interface: IEEE 1394 and SPB-2 compliant (100 to 400 Mbits/sec) plus USB 2.0 and USB 1.1 compatible

Here are links where these Maxtor drives can be bought:
Maxtor 200GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive
Maxtor 250GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive

The FireWire adapters I'm using are StarTech 4 Port IEEE-1394 PCI Firewire Cards. Don't forget that you will also need a FireWire hub if you want to connect more than 2 RAC nodes to the FireWire drive(s).

Pre-Installation Steps for All Clustered RAC Nodes

The following steps need to be performed on all nodes of the RAC cluster unless it says otherwise!

Downloading Oracle 10g Software and Burning Oracle 10g CDs
To install Oracle 10g with RAC, you will need the images "ship.crs.cpio.gz" (Cluster Ready Services 10.1.0.2) and "ship.db.cpio.gz" (Oracle Database 10g 10.1.0.2).
For more information on downloading the images and burning CDs, see Downloading Oracle10g Software and Burning Oracle10g CDs.

Installing Red Hat Advanced Server
You can find the installation guide for installing Red Hat Linux Advanced Server at Red Hat Enterprise Linux Manuals.

You cannot download Red Hat Linux Advanced Server, you can only download the source code. If you want to get the binary CDs, you can buy licenses at http://www.redhat.com/software/rhel/.

Installing Software Packages (RPMs)

You don't have to install all RPMs when you want to run an Oracle Database 10g with RAC on Red Hat Linux Advanced Server. You are fine when you select the Installation Type "Advanced Server" and when you don't select the Package Group "Software Development". There are only a few other RPMs that are required for installing Oracle 10g RAC, which are covered in this article.

Upgrading the Linux Kernel
General

It is recommended to use newer Red Hat Enterprise Linux kernels since newer kernels might fix known database performance problems and other issues. Unless you are using FireWire-based shared drives (see below), I recommend to download the latest RHELAS3 kernel from Red Hat Network and to use Upgrading the Linux Kernel as a guide for upgrading the kernel. However, you also need to make sure that the OCFS and ASM drivers are compatible with the kernel version!

Upgrading the Linux Kernel for FireWire Shared Disks ONLY

You can download a kernel from Oracle for FireWire-Based Shared Storage for Oracle Database 10g RAC, but Oracle does not support it. It is intended for testing and demonstration only! See Setting Up Linux with FireWire-based Shared Storage for Oracle10g RAC for more information.

Download the experimental kernel for FireWire shared drives from http://oss.oracle.com/projects/firewire/files.

There are two experimental kernels for FireWire shared drives, one for UP machines and one for SMP machines. To install the kernel for a single CPU machine, run the following command:
su - root
rpm -ivh kernel-2.4.21-15.ELorafw1.i686.rpm
Note that the above command does not upgrade your existing kernel. This is the preferred method since you always want the option to go back to the old kernel if the new kernel causes problems or doesn't come up.

To make sure that the right kernel is booted, check the /etc/grub.conf file if you use GRUB, and change the "default" attribute if necessary. Here is an example:
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Red Hat Enterprise Linux AS (2.4.21-15.ELorafw1)
root (hd0,0)
kernel /vmlinuz-2.4.21-15.ELorafw1 ro root=LABEL=/
initrd /initrd-2.4.21-15.ELorafw1.img
title Red Hat Enterprise Linux AS (2.4.21-4.EL)
root (hd0,0)
kernel /vmlinuz-2.4.21-4.EL ro root=LABEL=/
initrd /initrd-2.4.21-4.EL.img

In this example, the "default" attribute is set to "0" which means that the the experimental FireWire kernel 2.4.21-9.0.1.ELorafw1 will be booted. If the "default" attribute would be set to "1", the 2.4.21-9.EL kernel would be booted.

After you installed the new kernel, reboot the server:
su - root
reboot
Once you are sure that you don't need the old kernel anymore, you can remove the old kernel by running:
su - root
rpm -e
When you remove the old kernel you shouldn't have to update default parameter in the /etc/grub.conf file. However, I have seen cases where this didn't work. So I recommend to check the default setting after you removed the old kernel.

Configuring the Network
General

Each RAC node should have at least one static IP address for the public network and one static IP address for the private cluster interconnect.

The private networks are critical components of a RAC cluster. The private networks should only be used by Oracle to carry Cluster Manager and Cache Fusion inter-node connection. A RAC database does not require a separate private network, but using a public network can degrade database performance (high latency, low bandwidth). Therefore the private network should have high-speed NICs (preferably one gigabit or more) and it should only be used by Oracle.

You might want to manage the network addresses using the /etc/hosts file. This avoids the problem of making DNS, NIS, etc. a single point of failure for the database cluster.

Make sure that no firewall is running or that it doesn't interfere with RAC, respectively.

Setting Up the /etc/hosts File

Here is an example how the /etc/hosts file could look like:
# Public hostnames for e.g. eth0 interfaces (public network)

192.168.1.1 rac1pub.puschitz.com rac1pub # RAC node 1
192.168.1.2 rac2pub.puschitz.com rac2pub # RAC node 2
192.168.1.3 rac3pub.puschitz.com rac3pub # RAC node 3

# Private hostnames, private network for e.g. eth1 interfaces (Interconnect)

192.168.2.1 rac1prv.puschitz.com rac1prv # RAC node 1
192.168.2.2 rac2prv.puschitz.com rac2prv # RAC node 2
192.168.2.3 rac3prv.puschitz.com rac3prv # RAC node 3

# Public virtual IP address for e.g. eth0 interfaces (public Virtual Internet Protocol (VIP))

192.168.1.51 rac1vip.puschitz.com rac1vip # RAC node 1
192.168.1.52 rac2vip.puschitz.com rac2vip # RAC node 2
192.168.1.53 rac3vip.puschitz.com rac3viv # RAC node 3

The public virtual IP addressess are configured automatically by Oracle when you run OUI, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA), see Installing Oracle Database 10g Software R1 (10.1.0.2) with Real Application Clusters (RAC)

NOTE:

Make sure that the name of the RAC node is not listed for the loopback address in the /etc/hosts file similar to this example:
127.0.0.1 rac1pub localhost.localdomain localhost
The entry should look like this:
127.0.0.1 localhost.localdomain localhost

If the RAC node is listed for the loopback address, you might later get the following errors:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation
For more information, see Oracle 10g RAC Issues, Problems and Errors.

Configuring the Network Interfaces (NICs)

To configure the network interfaces (in this example eth0 and eth1), run the following command on each node.
su - root
redhat-config-network
NOTE: You do not have to configure the network alias names for the public VIP. This will be done by Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA).

NOTE: When the network configuration is done, it is important to make sure that the name of the public RAC nodes is displayed when you execute the following command:
$ hostname
rac1pub

You can verify the new configured NICs by running the command:
/sbin/ifconfig

Configuring Shared Storage Devices
General

For instructions on how to setup a shared storage device on Red Hat Advanced Server, see the installation instructions of the manufacturer.

Configuring FireWire-based Shared Storage

First make sure that the experimental kernel for FireWire drives was installed and that the server was rebooted (see Upgrading the Linux Kernel for FireWire Shared Disks Only):
# uname -r
2.4.21-15.ELorafw1

To load the kernel modules/drivers for the FireWire drive(s), add the following entry to the /etc/modules.conf file:
alias ieee1394-controller ohci1394
post-install ohci1394 modprobe sd_mod

The alias directive ieee1394-controller is used by Red Hat during the boot process. When you check the /etc/rc.d/rc.sysinit file, which is invoked by /etc/inittab during the boot process, you will find the following code that searches for the ieee1394-controller stanza in /etc/modules.conf:
if ! strstr "$cmdline" nofirewire ; then
aliases=`/sbin/modprobe -c | awk '/^alias ieee1394-controller/ { print $3 }'`
if [ -n "$aliases" -a "$aliases" != "off" ]; then
for alias in $aliases ; do
[ "$alias" = "off" ] && continue
action $"Initializing firewire controller ($alias): " modprobe $alias
done
LC_ALL=C grep -q "SBP2" /proc/bus/ieee1394/devices 2>/dev/null && \
modprobe sbp2 >/dev/null 2>&1
fi
fi
This means that all the kernel modules for the FireWire drive(s) will be loaded automatically during the next reboot and your drive(s) should be ready for use.

To load the modules or the firewire stack right away without rebooting the server, execute the following commands:
su - root
modprobe ieee1394-controller; modprobe sd_mod

If everything worked fine, the following modules should be loaded:
su - root
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sbp2 19724 0
ohci1394 28008 0 (unused)
ieee1394 62884 0 [sbp2 ohci1394]
sd_mod 13424 0
scsi_mod 104616 5 [sbp2 sd_mod sg sr_mod ide-scsi]
#

And when you run dmesg, you should see entries similar to this example:
# dmesg
...
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[11] MMIO=[f2000000-f20007ff] Max Packet=[2048]
ieee1394: Device added: Node[00:1023] GUID[0010b9f70089de1c] [Maxtor]
scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices
blk: queue cf172e14, I/O limit 4095Mb (mask 0xffffffff)
ieee1394: ConfigROM quadlet transaction error for node 01:1023
ieee1394: Host added: Node[02:1023] GUID[00110600000032a0] [Linux OHCI-1394]
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 0
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[00:1023]: Max speed [S400] - Max payload [2048]
Vendor: Maxtor Model: OneTouch Rev: 0200
Type: Direct-Access ANSI SCSI revision: 06
blk: queue cd0fb014, I/O limit 4095Mb (mask 0xffffffff)
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
SCSI device sda: 398295040 512-byte hdwr sectors (203927 MB)
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 >

In this example, the kernel reported that the FireWire drive can be shared concurrently by 3 servers (see "Maximum concurrent logins supported:"). It is very important that you have a drive with a chipset and firmware that supports concurrent access for the nodes. The "Number of active logins:" shows how many servers are already sharing/using the drive before this server added the drive to its system.

If everything worked fine, you should be able to see now your FireWire drive(s):
su - root
# fdisk -l

Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders
Units = cylinders of 16065 * 512 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 6375 51207156 83 Linux
/dev/sda2 6376 12750 51207187+ 83 Linux
...
And if you power off or power on your FireWire drive(s), respectively, the drives should be removed and added to the system automatically, which can take about 5-10 seconds.

If everything worked fine without any errors or problems, I would recommend to reboot all RAC nodes to verify that all FireWire drive(s) are automatically added to the system during the next boot process:
su - root
reboot
And after the reboot, execute the fdisk command again to verify that the FireWire drive(s) were added to the system:
su - root
# fdisk -l

PROBLEMS:

Note that if you have a USB device attached, the system might not be able to recognice your FireWire drive!

If the ieee1394 module was not loaded, then your FireWire adapter might not be supported. I'm using the StarTech 4 Port IEEE-1394 PCI Firewire Card which works fine:
# lspci
...
00:14.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
...

Creating Oracle User Accounts
If you use OCFS, it is important that the UID of "oracle" and GID of "oinstall" are the same across all RAC nodes. Otherwise the Oracle files on the OCFS filesystems on some nodes could either be "unowned", or they could even be owned by another user account. In my setup the UID and GID of oracle:dba is 700:700.
su - root
groupadd -g 700 dba # group of users to be granted with SYSDBA system privilege
groupadd -g 701 oinstall # group owner of Oracle files
useradd -c "Oracle software owner" -u 700 -g oinstall -G dba oracle
passwd oracle
To verify the oracle account, enter the following command:
# id oracle
uid=700(oracle) gid=701(oinstall) groups=701(oinstall),700(dba)

For more information on the "oinstall" group account, see When to use "OINSTALL" group during install of oracle.

Setting Oracle Environments
Since the Oracle Universal Installer (OUI) "runInstaller" is executed from the oracle account, some environment variables must be configured for the oracle account before OUI is started.

Note: When you set the Oracle environment variables for the RAC nodes, make sure to assign each RAC node a unique Oracle SID! In my test setup, the database name is "orcl" and the Oracle SIDs are "orcl1" for RAC node one, "orcl2" for RAC node two, and so on.

If you use bash which is the default shell on Red Hat Linux (to verify your shell run: echo $SHELL), execute the following commands:
# Oracle Environment
export ORACLE_BASE=/u01/app/oracle
export ORACLE_SID=orcl1 # Each RAC node must have a unique Oracle SID! E.g. orcl1, orcl2,...
export LD_LIBRARY_PATH=$ORACLE_HOME/lib

NOTE: If ORACLE_BASE is used, then Oracle recommends that you don't set the ORACLE_HOME environment variable but that you choose the default path suggested by the OUI. You can set and use ORACLE_HOME after you finished installing the Oracle Database 10g Software with RAC, see Installing Oracle Database 10g Software R1 (10.1.0.2) with Real Application Clusters (RAC).

The environment variables ORACLE_HOME and TNS_ADMIN should not be set. If you already set these environment variables, you can unset them by executing the following commands:
unset ORACLE_HOME
unset TNS_ADMIN

To have these environment variables set automatically each time you login as oracle, you can add these environment variables to the ~oracle/.bash_profile file for the Bash shell on Red Hat Linux. To do this you could simply copy/paste the following commands to make these settings permanent for the oracle Bash shell:
su - oracle
cat >> ~oracle/.bash_profile << EOF export ORACLE_BASE=/u01/app/oracle export ORACLE_SID=orcl1 # Each RAC node must have a unique Oracle SID! export LD_LIBRARY_PATH=$ORACLE_HOME/lib EOF Sizing Oracle Disk Space for Database Software You will need about 2.5 GB for the Oracle 10g RAC database software. At the time of this writing, OCFS only supports Oracle Datafiles and a few other files. Therefore OCFS should not be used for Shared Oracle Home installs. See Installing and Configuring Oracle Cluster File Systems (OCFS) for more information. Creating Oracle Directories At the time of this writing, OCFS only supports Oracle Datafiles and a few other files. Therefore OCFS should not be used for Shared Oracle Home installs. See Installing and Configuring Oracle Cluster File Systems (OCFS) for more information. For Oracle10g you only need to create the directory for $ORACLE_BASE: su - root mkdir -p /u01/app/oracle chown -R oracle.oinstall /u01 But if you want to comply with Oracle's Optimal Flexible Architecture (OFA), then you don't want to place the database files in the /u01 directory but in another directory like /u02. This is not a requirement but if you want to comply with OFA, then you might want to create the following directories as well: su - root mkdir -p /u02/oradata/orcl chown -R oracle.oinstall /u02 Here I would recommend to take a quick look at Oracle's new Optimal Flexible Architecture (OFA). NOTE: In my example I will not place the database files into the OCFS directory /u02/oradata/orcl since I will use Automatic Storage Management (ASM). However, I will use /u02/oradata/orcl for the cluster manager files, see Installing Cluster Ready Services (CRS). Creating Partitions on Shared Storage Devices The partitioning of a shared disk needs to be performed on only one RAC node! General Note that it is important for the Redo Log files to be on the shared disks as well. To partition the disks, you can use the fdisk utility: su - root fdisk For SCSI disks (including FireWire disks), stands for device names like /dev/sda, /dev/sdb, /dev/sdc, dev/sdd , etc. Be careful to use the right device name! Here is an example how to create a new 50 GB partition on drive /dev/sda: su - root # fdisk /dev/sda The number of cylinders for this disk is set to 30515. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): p Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 1 6375 51207156 83 Linux Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (6376-30515, default 6376): Using default value 6376 Last cylinder or +size or +sizeM or +sizeK (6376-30515, default 30515): +50GB Command (m for help): p Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 1 6375 51207156 83 Linux /dev/sda2 6376 12750 51207187+ 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: If you have created or modified any DOS 6.x partitions, please see the fdisk manual page for additional information. Syncing disks. # For more information on fdisk, see the fdisk(8) man page. After you finished creating the partitions, inform the kernel of the partition table changes: su - root partprobe Creating Partitions for OCFS If you use OCFS for database files and other Oracle files, you can create several partitions on your shared storage for the OCFS filesystems. If you use a FireWire disk, you could create one large partition on the disk which should make things easier. For more information on how to install OCFS and how to mount OCFS filesystems on partitions, see Installing and Configuring Oracle Cluster File Systems (OCFS). Creating Partitions for Raw Devices If you want to use raw devices, see Creating Partitions for Raw Devices for more information. This article does not cover raw devices. Installing and Configuring Oracle Cluster File Systems (OCFS) Note that OCFS is not required for 10g RAC. In fact, I never use OCFS for RAC systems. However, this article covers OCFS since some people want to know how to configure and use OCFS. The Oracle Cluster File System (OCFS) was developed by Oracle to overcome the limits of Raw Devices and Partitions. It also eases administration of database files because it looks and feels just like a regular file system. At the time of this writing, OCFS only supports Oracle Datafiles and a few other files: - Redo Log files - Archive log files - Control files - Database datafiles - Shared quorum disk file for the cluster manager - Shared init file (srv) Oracle says that they will support Shared Oracle Home installs in the future. So don't install the Oracle software on OCFS yet. See Oracle Cluster File System for more information. In this article I'm creating a separate, individual ORACLE_HOME directory on local server storage for each and every RAC node. NOTE: If files on the OCFS file system need to be moved, copied, tar'd, etc., or if directories need to be created on OCFS, then the standard file system commands mv, cp, tar,... that come with the OS should not be used. These OS commands can have a major OS performance impact if they are being used on the OCFS file system. Therefore, Oracle's patched file system commands should be used instead. It is also important to note that some 3rd vendor backup tools make use of standard OS commands like tar. Installing OCFS NOTE: In my example I will use OCFS only for the cluster manager files since I will use ASM for datafiles. Download the OCFS RPMs (drivers, tools) for RHEL3 from http://oss.oracle.com/projects/ocfs/files/RedHat/RHEL3/i386/. (you can use the same RPMs for FireWire shared disks). To find out which OCFS driver you need for your server, run: $ uname -a Linux rac1pub 2.4.21-9.ELsmp #1 Thu Jan 8 17:24:12 EST 2004 i686 i686 i386 GNU/Linux To install the OCFS RPMs for SMP kernels (including FireWire SMP kernels), execute: su - root rpm -Uvh ocfs-2.4.21-EL-smp-1.0.12-1.i686.rpm \ ocfs-tools-1.0.10-1.i386.rpm \ ocfs-support-1.0.10-1.i386.rpm To install the OCFS RPMs for uniprocessor kernels (including FireWire UP kernels), execute: su - root rpm -Uvh ocfs-2.4.21-EL-1.0.12-1.i686.rpm \ ocfs-tools-1.0.10-1.i386.rpm \ ocfs-support-1.0.10-1.i386.rpm Configuring and Loading OCFS To generate the /etc/ocfs.conf file, you can run the ocfstool tool: su - root ocfstool - Select "Task" - Select "Generate Config" - Select the interconnect interface (private network interface) In my example for rac1pub I selected: eth1, rac1prv - Confirm the values displayed and exit The generated /etc/ocfs.conf file will appear similar to the following example: $ cat /etc/ocfs.conf # # ocfs config # Ensure this file exists in /etc # node_name = rac1prv ip_address = 192.168.2.1 ip_port = 7000 comm_voting = 1 guid = 84D43BC8FB7A2C1B88C3000D8821CC2C The guid entry is the unique group user ID. This ID has to be unique for each node. You can create the above file without the ocfstool tool by editing the /etc/ocfs.conf file manually and by running ocfs_uid_gen -c to assign/update the guid value in this file. To load the ocfs.o kernel module, execute: su - root # /sbin/load_ocfs /sbin/insmod ocfs node_name=rac1prv ip_address=192.168.2.1 cs=1795 guid=84D43BC8FB7A2C1B88C3000D8821CC2C comm_voting=1 ip_port=7000 Using /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o # To verify if the ofcs module was loaded, execute: # /sbin/lsmod |grep ocfs ocfs 305920 0 (unused) Note that the load_ocfs command doest not have to be executed again once everything has been setup for the OCFS filesystems, see Configuring the OCFS File Systems to Mount Automatically at Startup. If you run load_ocfs on a system with the experimental FireWire kernel, you might get the following error message: su - root # load_ocfs /sbin/insmod ocfs node_name=rac1prv ip_address=192.168.2.1 cs=1843 guid=AA12637FAABFB354371C000D8821CC2C comm_voting=1 ip_port=7000 insmod: ocfs: no module by that name found load_ocfs: insmod failed # The ocfs.o module for the "FireWire kernel" can be found here: su - root # rpm -ql ocfs-2.4.21-EL-1.0.12-1 /lib/modules/2.4.21-EL-ABI/ocfs /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o # So for the experimental kernel for FireWire drives, I manually created a link for the ocfs.o module file: su - root mkdir /lib/modules/`uname -r`/kernel/drivers/addon/ocfs ln -s `rpm -qa | grep ocfs-2 | xargs rpm -ql | grep "/ocfs.o$"` \ /lib/modules/`uname -r`/kernel/drivers/addon/ocfs/ocfs.o Now you should be able to load the OCFS module using the "FireWire kernel", and the output should look similar to this example: su - root # /sbin/load_ocfs load_ocfs /sbin/insmod ocfs node_name=rac1prv ip_address=192.168.2.1 cs=1843 guid=AA12637FAABFB354371C000D8821CC2C comm_voting=1 ip_port=7000 Using /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o Warning: kernel-module version mismatch /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o was compiled for kernel version 2.4.21-4.EL while this kernel is version 2.4.21-15.ELorafw1 Warning: loading /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o will taint the kernel: forced load See http://www.tux.org/lkml/#export-tainted for information about tainted modules Module ocfs loaded, with warnings # I would not worry about the above warning. However, if you get the following error, then you have to upgrade the modutils RPM: su - root # /sbin/load_ocfs /sbin/insmod ocfs node_name=rac2prv ip_address=192.168.2.2 cs=1761 guid=1815F1C57530339EA00E000D8825B058 comm_voting=1 ip_port=7000 Using /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o: kernel-module version mismatch /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o was compiled for kernel version 2.4.21-4.EL while this kernel is version 2.4.21-15.ELorafw1. # To remedy the "loading" problem, download the latest modutils RPM and enter e.g.: rpm -Uvh modutils-2.4.25-11.EL.i386.rpm To verify that the ofcs module was loaded, enter: # /sbin/lsmod |grep ocfs ocfs 305920 0 (unused) Note that the load_ocfs command doest not have to be executed again once everything has been setup for the OCFS filesystems, see Configuring the OCFS File Systems to Mount Automatically at Startup. Creating OCFS File Systems Before you continue with the next steps, make sure you've created all needed partitions on your shared storage. Under Creating Oracle Directories I created the /u02/oradata/orcl mount directory for the cluster manager files. In the following example I will create one OCFS filesystem and mount it on /u02/oradata/orcl. The following steps for creating the OCFS filesystem(s) should only be executed on one RAC node! To create the OCFS filesystems, you can use the ocfstool: su - root ocfstool - Select "Task" - Select "Format" Alternatively, you can execute the "mkfs.ocfs" command to create the OCFS filesystems: su - root mkfs.ocfs -F -b 128 -L /u02/oradata/orcl -m /u02/oradata/orcl \ -u `id -u oracle` -g `id -g oracle` -p 0775 Cleared volume header sectors Cleared node config sectors Cleared publish sectors Cleared vote sectors Cleared bitmap sectors Cleared data block Wrote volume header # For SCSI disks (including FireWire disks), stands for devices like /dev/sda, /dev/sdb, /dev/sdc, dev/sdd, etc. Be careful to use the right device name! For this article I created an OCFS filesystem on /dev/sda1. mkfs.ocfs options: -F Forces to format existing OCFS volume -b Block size in kB. The block size must be a multiple of the Oracle block size. Oracle recommends to set the block size for OCFS to 128. -L Volume label -m Mount point for the device (in this article "/var/opt/oracle/oradata/orcl") -u UID for the root directory (in this article "oracle") -g GID for the root directory (in this article "oinstall") -p Permissions for the root directory Mounting OCFS File Systems As I mentioned previously, for this article I created one large OCFS fileystem on /dev/sda1. To mount the OCFS filesystem, I executed: su - root # mount -t ocfs /dev/sda1 /u02/oradata/orcl or # mount -t ocfs -L /u02/oradata/orcl /u02/oradata/orcl Now run the ls command on all RAC nodes to check the ownership: # ls -ld /u02/oradata/orcl drwxrwxr-x 1 oracle oinstall 131072 Jul 4 23:25 /u02/oradata/orcl # NOTE: If the above ls command does not display the same ownership on all RAC nodes (oracle:oinstall), then the "oracle" UID and the "oinstall" GID are not the same accross the RAC nodes, see Creating Oracle User Accounts for more information. Configuring the OCFS File Systems to Mount Automatically at Startup To ensure the OCFS filesystems are mounted automatically during reboots, the OCFS mount points need to be added to the /etc/fstab file. Add lines to the /etc/fstab file similar to the following example: /dev/sda1 /u02/oradata/orcl ocfs _netdev 0 0 The "_netdev" option prevents the OCFS filesystem from being mounted until the network has first been enabled on the system, which provides access to the storage device (see mount(8)). To make sure the ocfs.o kernel module is loaded and the OCFS file systems are mounted during the boot process, enter: su - root # chkconfig --list ocfs ocfs 0:off 1:off 2:off 3:on 4:on 5:on 6:off If the flags are not set to "on" as marked in bold, run the following command: su - root # chkconfig ocfs on You can also start the "ocfs" service manually by running: su - root # service ocfs start When you run this command it will not only load the ocfs.o kernel module but it will also mount the OCFS filesystems as configured in /etc/fstab. At this point you might want to reboot all RAC nodes to ensure that the OCFS filesystems are mounted automatically after reboots: su - root reboot Installing and Configuring Automatic Storage Management (ASM) and Disks General For information about what Automatic Storage Management is, see Configuring and Using Automatic Storage Management. See also Installing Oracle ASMLib for Linux. Installing ASM Download the latest Oracle ASM RPMs from http://otn.oracle.com/tech/linux/asmlib/index.html. Make sure that you download the right ASM driver for your kernel (UP or SMP). To install the ASM RPMs on a UP server, run: su - root rpm -Uvh oracleasm-2.4.21-EL-1.0.0-1.i686.rpm \ oracleasm-support-1.0.2-1.i386.rpm \ oracleasmlib-1.0.0-1.i386.rpm To install the ASM RPMs on a SMP server, run: su - root rpm -Uvh oracleasm-2.4.21-EL-smp-1.0.0-1.i686.rpm \ oracleasm-support-1.0.2-1.i386.rpm \ oracleasmlib-1.0.0-1.i386.rpm Configuring and Loading ASM To load the ASM driver oracleams.o and to mount the ASM driver filesystem, enter: su - root # /etc/init.d/oracleasm configure Configuring the Oracle ASM library driver. This will configure the on-boot properties of the Oracle ASM library driver. The following questions will determine whether the driver is loaded on boot and what permissions it will have. The current values will be shown in brackets ('[]'). Hitting without typing an answer will keep that current value. Ctrl-C will abort. Default user to own the driver interface []: oracle Default group to own the driver interface []: oinstall Start Oracle ASM library driver on boot (y/n) [n]: y Fix permissions of Oracle ASM disks on boot (y/n) [y]: y Writing Oracle ASM library driver configuration [ OK ] Creating /dev/oracleasm mount point [ OK ] Loading module "oracleasm" [ OK ] Mounting ASMlib driver filesystem [ OK ] Scanning system for ASM disks [ OK ] # Creating ASM Disks NOTE: Creating ASM disks is done on one RAC node! The following commands should only be executed on one RAC node! I executed the following commands to create my ASM disks: (make sure to change the device names!) (In this example I used partitions (/dev/sda2, /dev/sda3, /dev/sda5) instead of whole disks (/dev/sda, /dev/sdb, /dev/sdc,...)) su - root # /etc/init.d/oracleasm createdisk VOL1 /dev/ Marking disk "/dev/sda2" as an ASM disk [ OK ] # /etc/init.d/oracleasm createdisk VOL2 /dev/ Marking disk "/dev/sda3" as an ASM disk [ OK ] # /etc/init.d/oracleasm createdisk VOL3 /dev/ Marking disk "/dev/sda5" as an ASM disk [ OK ] # # # Replace "sd??" with the name of your device. I used /dev/sda2, /dev/sda3, and /dev/sda5 To list all ASM disks, enter: # /etc/init.d/oracleasm listdisks VOL1 VOL2 VOL3 # On all other RAC nodes, you just need to notify the system about the new ASM disks: su - root # /etc/init.d/oracleasm scandisks Scanning system for ASM disks [ OK ] # Configuring the "hangcheck-timer" Kernel Module Oracle uses the Linux kernel module hangcheck-timer to monitor the system health of the cluster and to reset a RAC node in case of failures. The hangcheck-timer module uses a kernel-based timer to periodically check the system task scheduler. This timer resets the node when the system hangs or pauses. This module uses the Time Stamp Counter (TSC) CPU register which is a counter that is incremented at each clock signal. The TCS offers very accurate time measurements since this register is updated by the hardware automatically. The hangcheck-timer module comes now with the kernel: find /lib/modules -name "hangcheck-timer.o" The hangcheck-timer module has the following two parameters: hangcheck_tick This parameter defines the period of time between checks of system health. The default value is 60 seconds. Oracle recommends to set it to 30 seconds. hangcheck_margin This parameter defines the maximum hang delay that should be tolerated before hangcheck-timer resets the RAC node. It defines the margin of error in seconds. The default value is 180 seconds. Oracle recommends to set it to 180 seconds. These two parameters indicate how long a RAC node must hang before the hangcheck-timer module will reset the system. A node reset will occur when the following is true: system hang time > (hangcheck_tick + hangcheck_margin)

To load the module with the right parameter settings, make entries to the /etc/modules.conf file. To do that, add the following line to the /etc/modules.conf file:
# su - root
# echo "options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180" >> /etc/modules.conf
Now you can run modprobe to load the module with the configured parameters in /etc/modules.conf:
# su - root
# modprobe hangcheck-timer
# grep Hangcheck /var/log/messages |tail -2
Jul 5 00:46:09 rac1pub kernel: Hangcheck: starting hangcheck timer 0.8.0 (tick is 180 seconds, margin is 60 seconds).
Jul 5 00:46:09 rac1pub kernel: Hangcheck: Using TSC.
#
Note: To ensure the hangcheck-timer module is loaded after each reboot, add the modprobe command to the /etc/rc.local file.

Setting up RAC Nodes for Remote Access
When you run the Oracle Installer on a RAC node, it will use ssh to copy Oracle software and data to other RAC nodes. Therefore, the oracle user on the RAC node where Oracle Installer is launched must be able to login to other RAC nodes without having to provide a password or passphrase.

The following procedure shows how ssh can be configured that no password is requested for oracle ssh logins.

To create an authentication key for oracle, enter the following command on all RAC node:
(the ~/.ssh directory will be created automatically if it doesn't exist yet)
su - oracle
$ ssh-keygen -t dsa -b 1024
Generating public/private dsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_dsa): Press ENTER
Created directory '/home/oracle/.ssh'.
Enter passphrase (empty for no passphrase): Enter a passphrase
Enter same passphrase again: Etner a passphrase
Your identification has been saved in /home/oracle/.ssh/id_dsa.
Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
The key fingerprint is:
e0:71:b1:5b:31:b8:46:d3:a9:ae:df:6a:70:98:26:82 oracle@rac1pub

Copy the pulic key for oracle from each RAC node to all other RAC nodes.
For example, run the following commands on all RAC nodes:
su - oracle
ssh rac1pub cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh rac2pub cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh rac3pub cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now verify that oracle on each RAC node can login to all other RAC nodes without a password. Make sure that ssh only asks for the passphrase. Note, however, that the first time you ssh to another server you will get a message stating that the authenticity of the host cannot be established. Enter "yes" at the prompt to continue the connection.
For example, run the following commands on all RAC nodes to verify that no password is asked:
su - oracle
ssh rac1pub hostname
ssh rac1pub hostname
ssh rac1prv hostname
ssh rac2pub hostname
ssh rac2prv hostname
ssh rac3pub hostname
ssh rac3prv hostname

And later, before runInstaller is launched, I will show how ssh can be configured that no passphrase has to be entered for oracle ssh logins.

Checking Packages (RPMs)
Some packages will be missing when you selected the Installation Type "Advanced Server" during the Red Hat Advanced Server installation.

The following additional RPMs are required:
rpm -q gcc glibc-devel glibc-headers glibc-kernheaders cpp compat-libstdc++
To install these RPMS, run:
su - root
rpm -ivh gcc-3.2.3-24.i386.rpm \
glibc-devel-2.3.2-95.6.i386.rpm \
glibc-headers-2.3.2-95.6.i386.rpm \
glibc-kernheaders-2.4-8.34.i386.rpm \
cpp-3.2.3-24.i386.rpm \
compat-libstdc++-7.3-2.96.123.i386.rpm
The opemotif RPM is also required, otherwise you won't pass Oracle's recommended operating system packages test. If it's not installed on your system, run
su - root
rpm -ivh openmotif-2.2.2-16.i386.rpm
I recommend using the latest RPM version.

Adjusting Network Settings
Oracle now uses UDP as the default protocol on Linux for interprocess communication, such as cache fusion buffer transfers between the instances.
It is strongly suggested to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB. The receive buffers are used by TCP and UDP to hold received data until is is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

The default and maximum window size can be changed in the proc file system without reboot:
su - root
sysctl -w net.core.rmem_default=262144 # Default setting in bytes of the socket receive buffer
sysctl -w net.core.wmem_default=262144 # Default setting in bytes of the socket send buffer
sysctl -w net.core.rmem_max=262144 # Maximum socket receive buffer size which may be set by using the SO_RCVBUF socket option
sysctl -w net.core.wmem_max=262144 # Maximum socket send buffer size which may be set by using the SO_SNDBUF socket option
To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process:
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144

Sizing Swap Space
It is important to follow the steps as outlined in Sizing Swap Space.

Setting Shared Memory
It is important to follow the steps as outlined in Setting Shared Memory.

Checking /tmp Space
It is important to follow the steps as outlined in Checking /tmp Space.

Setting Semaphores
It is recommended to follow the steps as outlined in Setting Semaphores.

Setting File Handles
It is recommended to follow the steps as outlined in Setting File Handles.

Installing Cluster Ready Services (CRS)
General
Cluster Ready Services (CRS) contains cluster and database configuration information for RAC, and it provides many system management features. CRS accepts registration of Oracle instances to the cluster and it sends ping messages to other RAC nodes. If the heartbeat fails, CRS will use shared disk to distinguish between a node failure and a network failure.

Once CRS is running on all RAC nodes, OUI will automatically recognice all nodes on the cluster. This means that you can run OUI on one RAC node to install the Oracle software on all other RAC nodes.

Note that Automatic Storage Management (ASM) cannot be used for the "Oracle Cluster Registry (OCR)" file or for the "CRS Voting Disk" file. These files must be accessible before any Oracle instances are started. And for ASM to become available, the ASM instance needs to run first.

In the following example I will use OCFS for the "Oracle Cluster Registry (OCR)" file and for the "CRS Voting disk" file. The Oracle Cluster Registry file has a size of about 100 MB, and the CRS Voting Disk file has a size of about 20 MB. Tese files must reside on OCFS or on a shared raw device, or on any other clustered filesystem.

Automating Authentication for oracle ssh Logins
Make sure that the oracle user can ssh to all RAC nodes without ssh asking for a passphrase. This is very important because otherwise OUI won't be able to install the Oracle software on other RAC nodes. The following example shows how ssh-agent can do the authentication for you when the oracle account logs in to other RAC nodes using ssh.

Open a new terminal for the RAC node where you will execute runInstaller and use this terminal to login from your desktop using the following command:
$ ssh -X oracle@rac?pub
The "X11 forward" feature (-X option) of ssh will relink X to your local desktop. For more information, see Installing Oracle10g on a Remote Linux Server.

Now configure ssh-agent to handle the authentication for the oracle account:
oracle$ ssh-agent $SHELL
oracle$ ssh-add
Enter passphrase for /home/oracle/.ssh/id_dsa: Enter your passphrase
Identity added: /home/oracle/.ssh/id_dsa (/home/oracle/.ssh/id_dsa)
oracle$

Now make sure the oracle user can ssh into each RAC node. It is very important that NO text is displayed and that you are not asked for a passphrase. Only the server name of the remote RAC node should be displayed:
oracle$ ssh rac1pub hostname
rac1pub
oracle$ ssh rac1prv hostname
rac1pub
oracle$ ssh rac2pub hostname
rac2pub
oracle$ ssh rac2prv hostname
rac2pub
oracle$ ssh rac3pub hostname
rac3pub
oracle$ ssh rac3prv hostname
rac3pub

NOTE: Keep this terminal open since this is the terminal that will be used for running runInstaller!

Checking OCFS and Oracle Environment Variables
Checking OCFSs

Make sure the OCFS filesystem(s) are mounted on all RAC nodes:
oracle$ ssh rac1pub df |grep oradata
/dev/sda1 51205216 33888 51171328 1% /u02/oradata/orcl
oracle$ ssh rac2pub df |grep oradata
/dev/sda1 51205216 33888 51171328 1% /u02/oradata/orcl
oracle$ ssh rac3pub df |grep oradata
/dev/sda1 51205216 33888 51171328 1% /u02/oradata/orcl

Checking Oracle Environment Variables

Run the following command on all RAC nodes:
su - oracle
$ set | grep ORA
ORACLE_BASE=/u01/app/oracle
ORACLE_SID=orcl1
$
It is important that $ORACLE_SID is different on each RAC node!
It is also recommended that $ORACLE_HOME is not set but that OUI selects the home directory.

Installing Oracle 10g Cluster Ready Services (CRS) R1 (10.1.0.2)
In order to install the Cluster Ready Services (CRS) R1 (10.1.0.2) on all RAC nodes, OUI has to be launched on only one RAC node. In my example I will run OUI always on rac1pub.

To install CRS, insert the "Cluster Ready Services (CRS) R1 (10.1.0.2)" CD (downloadedd image name: "ship.crs.cpio.gz"), and mount it on e.g. rac1pub:
su - root
mount /mnt/cdrom

Use the oracle terminal that you prepared for ssh at Automating Authentication for oracle ssh Logins and execute runInstaller:
oracle$ /mnt/cdrom/runInstaller

- Welcome Screen: Click Next
- Inventory directory and credentials:
Click Next
- Unix Group Name: Use "oinstall".
- Root Script Window: Open another window, login as root, and run /tmp/orainstRoot.sh
on the node where you launched runInstaller.
After you've run the script, click Continue.
- File Locations: I used the recommended default values:
Destination Name: OraCr10g_home1
Destination Path: /u01/app/oracle/product/10.1.0/crs_1
Click Next
- Language Selection: Click Next
- Cluster Configuration:
Cluster Name: crs
Cluster Nodes: Public Node Name: rac1pub Private Node Name: rac1prv
Public Node Name: rac2pub Private Node Name: rac2prv
Public Node Name: rac3pub Private Node Name: rac3prv
Click Next
- Private Interconnect Enforcement:
Interface Name: eth0 Subnet: 192.168.1.0 Interface Type: Public
Interface Name: eth1 Subnet: 192.168.2.0 Interface Type: Private
Click Next
- Oracle Cluster Registry:
OCR Location: /u02/oradata/orcl/OCRFile
Click Next
- Voting Disk: Voting disk file name: /u02/oradata/orcl/CSSFile
Click Next
- Root Script Window:
Open another window, login as root, and execute
/u01/app/oracle/oraInventory/orainstRoot.sh on ALL RAC Nodes!

NOTE: For any reason Oracle does not create the log directory
"/u01/app/oracle/product/10.1.0/crs_1/log". If there are problems with
CRS, it will create log files in this directory, but only if it exists.
Therefore make sure to create this directory as oracle:
oracle$ mkdir /u01/app/oracle/product/10.1.0/crs_1/log

After you've run the script, click Continue.
- Setup Privileges Script Window:
Open another window, login as root, and execute
/u01/app/oracle/product/10.1.0/crs_1/root.sh on ALL RAC Nodes one by one!
Note that his can take a while. On the last RAC node, the output of the
script was as follows:
...
CSS is active on these nodes.
rac1pub
rac2pub
rac3pub
CSS is active on all nodes.
Oracle CRS stack installed and running under init(1M)
Click OK
- Summary: Click Install
- When installation is completed, click Exit.

One way to verify the CRS installation is to display all the nodes where CRS was installed:
oracle$ /u01/app/oracle/product/10.1.0/crs_1/bin/olsnodes -n
rac1pub 1
rac2pub 2
rac3pub 3

Installing Oracle Database 10g Software with Real Application Clusters (RAC)
General
The following procedure shows the installation of the software for Oracle Database 10g Software R1 (10.1.0.2) with Real Application Clusters (RAC).

Note that Oracle Database 10g R1 (10.1) OUI will not be able to discover disks that are marked as Linux ASMLib. Therefore it is recommended to complete the software installation and then to use dbca to create the database, see http://otn.oracle.com/tech/linux/asmlib/install.html#10gr1 for more information.

Automating Authentication for oracle ssh Logins
Before you install the Oracle Database 10g Software with Real Application Clusters (RAC) R1 (10.1.0.2), it is important that you followed the steps as outlined in Automating Authentication for oracle ssh Logins.

Checking Oracle Environment Variables
Run the following command on all RAC nodes:
su - oracle
$ set | grep ORA
ORACLE_BASE=/u01/app/oracle
ORACLE_SID=orcl1
$
It is important that $ORACLE_SID is different on each RAC node!
It is also recommended that $ORACLE_HOME is not set but that OUI selects the home directory.

Installing Oracle Database 10g Software R1 (10.1.0.2) with Real Application Clusters (RAC)
In order to install the Oracle Database 10g R1 (10.1.0.2) Software with Real Application Clusters (RAC) on all RAC nodes, OUI has to be launched on only one RAC node. In my example I will run OUI on rac1pub.

To install the RAC Database software, insert the Oracle Database 10g R1 (10.1.0.2) CD (downloaded image name: "ship.db.cpio.gz"), and mount it on e.g. rac1pub:
su - root
mount /mnt/cdrom

Use the oracle terminal that you prepared for ssh at Automating Authentication for oracle ssh Logins, and execute runInstaller:
oracle$ /mnt/cdrom/runInstaller

- Welcome Screen: Click Next
- File Locations: I used the default values:
Destination Name: raDb10g_home1
Destination Path: /u01/app/oracle/product/10.1.0/db_1
Click Next.
- Hardware Cluster Installation Mode:
Select "Cluster Installation"
Click "Select All" to select all servers: rac1pub, rac2pub, rac3pub
Click Next

NOTE: If it stops here and the status of a RAC node is "Node not reachable",
then perform the following checks:
- Check if the node where you launched OUI is able to do ssh without a
passphrase to the RAC node where the status is set to "Node not reachable".
- Check if the CRS is running this RAC node.
- Installation Type:
I selected "Enterprise Edition".
Click Next.
- Product-specific Prerequisite Checks:
Make sure that the status of each Check is set to "Succeeded".
Click Next
- Database Configuration:
I selected "Do not create a starter database" since we have to create the
database with dbca. Oracle Database 10g R1 (10.1) OUI will not be able to
discover disks that are marked as Linux ASMLib. For more information, see
http://otn.oracle.com/tech/linux/asmlib/install.html#10gr1
Click Next
- Summary: Click Install
- Setup Privileges Window:
Open another window, login as root, and execute
/u01/app/oracle/product/10.1.0/db_1/root.sh on ALL RAC Nodes one by one!
NOTE: Make also sure that X is relinked to your local desktop since this
script will launch the "VIP Configuration Assistant" tool which is a
GUI based utility!

VIP Configuration Assistant Tool:
(This Assistant tool will come up only once when root.sh is executed the
first time in your RAC cluster)
- Welcome Click Next
- Network Interfaces: I selected both interfaces, eth0 and eth1.
Click Next
- Virtual IPs for cluster nodes:
(for the alias names and IP address, see Setting Up the /etc/hosts File)
Node Name: rac1pub
IP Alias Name: rac1vip
IP address: 192.168.1.51
Subnet Mask: 255.255.255.0

Node Name: rac2pub
IP Alias Name: rac2vip
IP address: 192.168.1.52
Subnet Mask: 255.255.255.0

Node Name: rac3pub
IP Alias Name: rac3vip
IP address: 192.168.1.53
Subnet Mask: 255.255.255.0
Click Next
- Summary: Click Finish
- Configuration Assistant Progress Dialog:
Click OK after configuration is complete.
- Configuration Results:
Click Exit

Click OK to close the Setup Privilege Window.

- End of Installation:
Click Exit

If OUI terminates abnormally (happend to me several times), or if anything else goes wrong, remove the following files/directories and start over again:
su - oracle
rm -rf /u01/app/oracle/product/10.1.0/db_1

Installing Oracle Database 10g with Real Application Cluster (RAC)
General
The following steps show how to use dbca to create the database and its instances. Oracle recommends to use dbca to create the RAC database since the preconfigured databases are optimized for ASM, server parameter file, and automatic undo management. dbca also makes it much more easier to create new ASM disk groups etc.

Automating Authentication for oracle ssh Logins
Before you install a RAC database, it is important that you followed the steps as outlined in Automating Authentication for oracle ssh Logins.

Setting Oracle Environment Variables
Since the Oracle RAC software is already installed, $ORACLE_HOME can now be set to the home directory that was choosen by OUI.

The following steps should now be performed on all RAC nodes! It is very important that these environment variables are set permanently for oracle on all RAC nodes!

To make sure $ORACLE and $PATH are set automatically each time oracle logs in, add these environment variables to the ~oracle/.bash_profile file which is the user startup file for the Bash shell on Red Hat Linux. To do this you could simply copy/paste the following commands to make these settings permanent for your oracle's Bash shell (the path might differ on your system!):
su - oracle
cat >> ~oracle/.bash_profile << EOF export ORACLE_HOME=$ORACLE_BASE/product/10.1.0/db_1 export PATH=$PATH:$ORACLE_HOME/bin export LD_LIBRARY_PATH=$ORACLE_HOME/lib EOF Installing Oracle Database 10g with Real Application Cluster (RAC) To install the RAC database and the instances on all RAC nodes, OUI has to be launched on only one RAC node. In my example I will run OUI on rac1pub. Use the oracle terminal that you prepared for ssh at Automating Authentication for oracle ssh Logins, and execute dbca. But before you execute dbca, make sure that $ORACLE_HOME and $PATH are set: oracle$ . ~oracle/.bash_profile oracle$ dbca - Welcome Screen: Select "Oracle Real Application Clusters database" Click Next - Operations: Select "Create Database" Click Next - Node Selection: Click "Select All". Make sure all your RAC nodes show up and are selected! If dbca hangs here, then you probably didn't follow the steps as outlined at Automating Authentication for oracle ssh Logins Click Next - Database Templates: I selected "General Purpose". Click Next - Database Identification: Global Database Name: orcl SID Prefix: orcl Click Next - Management Option: I selected "Use Database Control for Database Management". Click Next - Database Credentials: I selected "Use the Same Password for All Accounts". Enter the password and make sure the password does not start with a digit number. Click Next - Storage Options: I selected "Automatic Storage Management (ASM)", see Installing and Configuring Automatic Storage Management (ASM) and Disks Click Next - Create ASM Instance: Enter the SYS password for the ASM instance. I selected the default parameter file (IFILE): "{ORACLE_BASE}/admin/+ASM/pfile/init.ora" Click Next At this point DBCA will create and start the ASM instance on all RAC nodes. Click OK to create and start the ASM instance. An error will come up that oratab can't be copied to /tmp. I ignored this error. If you get "ORACLE server session terminated by fatal error", then you probably didn't follow the steps at Setting Up the /etc/hosts File - ASM Disk Groups: - Click "Create New" Create Disk Group Window: - Click "Change Disk Discovery Path". - Enter "ORCL:VOL*" for Disk Discovery Path. The discovery string for finding ASM disks must be prefixed with "ORCL:", and in my example I called the ASM disks VOL1, VOL2, VOL3. - I entered an arbitraty Disk Group Name: ORCL_DATA1 - I checked the candidate: "ORCL:VOL1" and "ORCL:VOL2" which have together about 60 GB space in my configuration. - Click OK. - Check the new created disk group "ORCL_DATA1". - Click Next - Database File Locations: Select "Use Oracle-Managed Files" Database Area: +ORCL_DATA1 Click Next - Recovery Configuration: Using recovery options like Flash Recovery Area is out of scope for this article. So I did not select any recovery options. Click Next - Database Content: I did not select Sample Schemas or Custom Scripts. Click Next - Database Services: Click "Add" and enter a Service Name: I entered "orcltest". I selected TAF Policy "Basic". Click Next - Initialization Parameters: Change settings as needed. Click Next - Database Storage: Change settings as needed. Click Next - Creation Options: Check "Create Database" Click Finish - Summary: Click OK Now the database is being created. The following error message came up: "Unable to copy the file "rac2pub:/etc/oratab" to "/tmp/oratab.rac2pub". I clicked "Ignore". I have to investigate this. Your RAC cluster should now be up and running. To verify, try to connect to each instance from one of the RAC nodes: $ sqlplus system@orcl1 $ sqlplus system@orcl2 $ sqlplus system@orcl3 After you connected to an instance, enter the following SQL command to verify your connection: SQL> select instance_name from v$instance;

Post-Installation Steps

Transparent Application Failover (TAF)
Introduction

Processes external to the Oracle 10g RAC cluster control the Transparent Application Failover (TAF). This means that the failover types and methods can be unique for each Oracle Net client. The re-connection happens automatically within the OCI library which means that you do not need to change the client application to use TAF.

Setup

To test TAF on the new installed RAC cluster, configure the tnsnames.ora file for TAF on a non-RAC server where you have either the Oracle database software or the Oracle client software installed.

Here is an example how my /opt/oracle/product/9.2.0/network/admin/tnsnames.ora: looks like:
ORCLTEST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac3vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)

The following SQL statement can be used to check the sessions's failover type, failover method, and if a failover has occured:
select instance_name, host_name,
NULL AS failover_type,
NULL AS failover_method,
NULL AS failed_over
FROM v$instance
UNION
SELECT NULL, NULL, failover_type, failover_method, failed_over
FROM v$session
WHERE username = 'SYSTEM';

Example of a Transparent Application Failover (TAF)

Here is an example of a Transparent Application Failover:
su - oracle
$ sqlplus system@orcltest

SQL> select instance_name, host_name,
2 NULL AS failover_type,
3 NULL AS failover_method,
4 NULL AS failed_over
5 FROM v$instance
6 UNION
7 SELECT NULL, NULL, failover_type, failover_method, failed_over
8 FROM v$session
9 WHERE username = 'SYSTEM';

INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ---------- ------------- ---------- ---
orcl1 rac1pub
SELECT BASIC NO

SQL>

The above SQL statement shows that I'm connected to "rac1pub" for instance "orcl1".
In this case, execute shutdown abort on "rac1pub" for instance "orcl1":
SQL> shutdown abort
ORACLE instance shut down.
SQL>

Now rerun the SQL statement:
SQL> select instance_name, host_name,
2 NULL AS failover_type,
3 NULL AS failover_method,
4 NULL AS failed_over
5 FROM v$instance
6 UNION
7 SELECT NULL, NULL, failover_type, failover_method, failed_over
8 FROM v$session
9 WHERE username = 'SYSTEM';

INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ---------- ------------- ---------- ---
orcl2 rac2pub
SELECT BASIC YES

SQL>

The SQL statement shows that the sessions has failed over to instance "orcl2". Note that this can take a few seconds.

Checking Automatic Storage Management (ASM)
Here are a couple SQL statements to verify ASM.

Run the following command to see which data files are in which disk group:
SQL> select name from v$datafile
2 union
3 select name from v$controlfile
4 union
5 select member from v$logfile;

NAME
--------------------------------------------------------------------------------
+ORCL_DATA1/orcl/controlfile/current.260.3
+ORCL_DATA1/orcl/datafile/sysaux.257.1
+ORCL_DATA1/orcl/datafile/system.256.1
+ORCL_DATA1/orcl/datafile/undotbs1.258.1
+ORCL_DATA1/orcl/datafile/undotbs2.264.1
+ORCL_DATA1/orcl/datafile/users.259.1
+ORCL_DATA1/orcl/onlinelog/group_1.261.1
+ORCL_DATA1/orcl/onlinelog/group_2.262.1
+ORCL_DATA1/orcl/onlinelog/group_3.265.1
+ORCL_DATA1/orcl/onlinelog/group_4.266.1

10 rows selected.

SQL>

Run the following command to see which ASM disk(s) belong to the disk group 'ORCL_DATA1':
(ORCL_DATA1 was specified in Installing Oracle Database 10g with Real Application Cluster)
SQL> select path from v$asm_disk where group_number in
2 (select group_number from v$asm_diskgroup where name = 'ORCL_DATA1');

PATH
--------------------------------------------------------------------------------
ORCL:VOL1
ORCL:VOL2

SQL>

Oracle 10g RAC Issues, Problems and Errors
This section describes other issues, problems and errors pertaining to installing Oracle 10g with RAC which has not been covered so far.

Gtk-WARNING **: libgdk_pixbuf.so.2: cannot open shared object file: No such file or directory

This error can come up when you run ocfstool. To fix this error, install the gdk-pixbuf RPM:
rpm -ivh gdk-pixbuf-0.18.0-8.1.i386.rpm

/u01/app/oracle/product/10.1.0/crs_1/bin/crs_stat.bin: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory
/u01/app/oracle/product/10.1.0/crs_1/bin/crs_stat.bin: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory
PRKR-1061 : Failed to run remote command to get node configuration for node rac1pup
PRKR-1061 : Failed to run remote command to get node configuration for node rac1pup

This error can come up when you run root.sh. To fix this error, install the compat-libstdc++ RPM and rerun root.sh:
rpm -ivh compat-libstdc++-7.3-2.96.122.i386.rpm

mount: fs type ocfs not supported by kernel

The OCFS kernel module was not loaded. See Configuring and Loading OCFS for more information.

ORA-00603: ORACLE server session terminated by fatal error
or
SQL> startup nomount
ORA-29702: error occurred in Cluster Group Service operation

If the trace file looks like this:
/u01/app/oracle/product/10.1.0/db_1/rdbms/log/orcl1_ora_7424.trc
...
kgefec: fatal error 0
*** 2004-03-13 20:50:28.201
ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:gethostbyname failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: sskgxpmyip4
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+493 call ksedst()+0 0 ? 0 ? 0 ? 1 ? 0 ? 0 ?
ksfdmp()+14 call ksedmp()+0 3 ? BFFF783C ? A483593 ?
BF305C0 ? 3 ? BFFF8310 ?

Make sure that the name of the RAC node is not listed for the loopback address in the /etc/hosts file similar to this example:
127.0.0.1 rac1pub localhost.localdomain localhost
The entry should rather look like this:
127.0.0.1 localhost.localdomain localhost

Tuesday, January 17, 2012

SRVCTL Commands for RAC

srvctl enable asm -n node_name [-i ] asm_instance_name

Use the following syntax to disable an ASM instance:

srvctl disable asm -n node_name [-i asm_instance_name]

The above statement is generally required when u want to disable the asm so that asm does not start automatically on reboot.

You can also use SRVCTL to start, stop, and obtain the status of an ASM instance as in the following examples.

Use the following syntax to start an ASM instance:

srvctl start asm -n node_name [-i asm_instance_name] [-o start_options] [-c | -q]

Use the following syntax to stop an ASM instance:

srvctl stop asm -n node_name [-i asm_instance_name] [-o stop_options] [-c | -q]

Use the following syntax to show the configuration of an ASM instance:

srvctl config asm -n node_name

Use the following syntax to obtain the status of an ASM instance:

srvctl status asm -n node_name

Use the following syntax to stop the database on anyone of the nodes in an RAC environment:

srvctl stop listener -n node_name

srvctl stop instance -d -i -o immediate

Use the following syntax to start the database on anyone of the nodes in an RAC environment:

srvctl start listener -n node_name

srvctl start instance -d -i -o immediate

Use the following command to stop the database using single command:

srvctl stop database -d -o immediate

In order to stop the node applications like oracle vip,gsd, ons services ,use following commands

srvctl stop nodeapps -n

Similarly to start

srvctl start nodeapps -n

If u wish to trun srvm trace on which can be useful for diagnosisi during instance startup or shut dowm use following command:

SRVM_TRACE=TRUE; export SRVM_TRACE

srvctl stop asm -n -i +ASM1 -o immediate>srvctl_stop_asm.log

Similarly

srvctl stop instance -d db_name -i instance_name -o immediate > srvctl_stop_instance.log

Oracle Database Performance Results with Smart Flash Cache on Sun SPARC Enterprise Midrange Server

This article examines the improvements to Oracle database performance that were observed by adding smart flash cache to the configuration. Measurements were made using the iGEN-OLTP 1.6 benchmark, which was formulated to simulate a lightweight Global Order System. Tests were run with and without the smart flash cache, each time varying the size of the SGA buffer cache at 10%, 16%, and 20% the size of the database. The results demonstrate that an intelligent database that knows how to efficiently take advantage of flash-based storage can experience significant improvements in performance.
Contents

Introduction
Database Smart Flash Cache
Benchmark Description
System Configuration Details
Test Results
Conclusion
Appendix: Oracle Initialization File init.ora

Introduction

Today’s complex business applications typically house massive volumes of data and serve large numbers of users—a trend that drives performance requirements that are increasingly difficult to attain. To achieve fast response times for data-intensive applications, systems must be able to access data rapidly and transfer it quickly from storage to compute resources for processing. Many data-driven applications suffer from long latencies and slow response times due to I/O bottlenecks that limit throughput between storage and servers. Traditional remedies, such as increasing memory size or short-stroking disk drives by placing data on outer sectors, are costly and power intensive, but they can help up to a point. However, the problem remains due to fast CPUs processing data in nanoseconds and disk drives delivering data in milliseconds.

As flash technology moves into the enterprise, it holds promise for accelerating application performance, reducing bottlenecks, and helping to lower data center energy consumption. With a layer of flash-based storage in the form of solid-state drives (SSDs) between traditional disk media and host processors, today’s powerful CPUs can experience less idle time waiting for I/O operations to complete. SSDs deliver data in microseconds and can thus contribute to major improvements in application performance. The question still remains, however, about how to take advantage of flash technology intelligently and efficiently without imposing the additional overhead of actively managing and constantly positioning data into the proper storage tier. To address this challenge, Oracle created the Database Smart Flash Cache feature, which aims to take advantage of this new storage tier, while reducing complexity and without over-burdening administrators in the data center.
Database Smart Flash Cache

Oracle's Database Smart Flash Cache is available in Oracle Database 11g Release 2 for both Oracle Solaris and Oracle Enterprise Linux. It intelligently caches data from the Oracle Database, replacing slow mechanical I/O operations to disk with much faster flash-based storage operations. The Database Smart Flash Cache feature acts as a transparent extension of the database buffer cache using solid-state drive (SSD) or “flash” technology. The flash acts as a level-two cache to the database buffer cache. If a process doesn’t find the block it needs in the buffer cache, it performs a physical read from the level-two SGA buffer pool residing on flash. The read from flash will be quite fast, in the order of microseconds, when compared to performing a physical read operation from a traditional hard disk drive (HDD), which takes milliseconds.

Database Smart Flash Cache with flash storage gives administrators the ability to greatly improve the performance of Oracle databases by reducing the required amount of traditional disk I/O at a much lower cost than adding an equivalent amount of RAM. Software intelligence determines how and when to use the flash storage, and how best to incorporate flash into the database as part of a coordinated data caching strategy to deliver improved performance to applications.

Database Smart Flash Cache technology allows frequently accessed data to be kept in very fast flash storage while most of the data is kept in very cost-effective disk storage. This happens automatically without you taking any action. Oracle Database Smart Flash is smart because it knows when to avoid trying to cache data that will never be reused or will not fit in the cache.

Random reads against tables and indexes are likely to have subsequent reads and normally are cached and have their data delivered from the flash cache, if the data is not found in the buffer cache. Scans, or sequentially reading tables, generally are not cached since sequentially accessed data is unlikely to be subsequently followed by reads of the same data. Write operations are written through to the disk and staged back to cache if the software determines they are likely to be subsequently re-read. Knowing what not to cache is of great importance for realizing the performance potential of the cache. For example, the software avoids caching blocks when writing redo logs, writing backups, or writing to a mirrored copy of a block. Since these blocks will not be re-read in the near term, there is no reason to devote valuable cache space to these objects or blocks.

In addition, Oracle allows you to provide directives at the database table, index, and segment level to ensure that Database Smart Flash Cache is used where desired. Tables can be moved in and out of flash with a simple command, without the need to move the table to different tablespaces, files, or LUNs, as is typically done in traditional storage with flash disks. Only the Oracle Database has this functionality and understands the nature of all the I/O operations taking place on the system. Having knowledge of the complete I/O stack allows optimized use of Database Smart Flash Cache to store only the most frequently accessed data.

All this functionality occurs automatically without administrator configuration or tuning. This paper discusses the advantages of using Database Smart Flash Cache when running an in-house developed OLTP benchmark on the Sun SPARC Enterprise M4000 server from Oracle using Oracle Solaris 10 and the Sun Flash Accelerator F20 PCIe Card. The following topics are covered in the remaining sections of the paper:

Detailed description the iGEN-OLTP benchmark
System configuration details
Capabilities of the Sun SPARC Enterprise M4000 and M5000 servers
Sun Flash Accelerator F20 PCIe Card features
Results of the benchmark, including the impact of Database Smart Flash Cache

Benchmark Description

The iGEN-OLTP 1.6 benchmark is an internally developed transaction processing database workload. This workload simulates a lightweight Global Order System, and it was developed from a variety of customer workloads. It has a high degree of concurrency and stresses database commit operations. It is completely random in table row selections and, therefore, it is difficult to 'localize' or optimize the SQL processing. The transactions used in the iGEN benchmark require more computational work when compared to the transactions used in a TPC-C benchmark.

The database has 1.25 million customers residing in it and is approximately 50 GB in size. It consists of six tables: customer, location, industry, product, order, and activity. Each table has no more than six columns, and each has an index.

The application executes five transactions: light, medium, average, DSS, and heavy. The transactions comprise various SQL statements: read-only selects, joins, averages, updates, and inserts. All tests are completed using a mix of these transactions. The description and distribution mix of these transactions are shown in the Table 1.
Table 1. iGEN-OLTP 1.6 TRANSACTIONS DESCRIPTION AND DISTRIBUTION
TRANSACTION MIX (PERCENTAGE) DESCRIPTION
Light 16% 1 select for update, 1 select, and 1 update
Medium 35% 2 selects for update, 1 select, and 1 update
Average 6% 1 select for update and 1 compute average from another select
DSS 11% 1 select for update and 1 compute sum from join on 3 tables
Heavy 30% 1 select for update, 1 select, 1 update, and 1 insert

The client driver program, which could be run in a middle tier application server, is designed as a Java multithreaded load generator program with each thread making one connection to the database server via the JDBC program API to simulate a client connection.

The load generator driver requires the following user-supplied options for execution:

User count or connection count
Time for ramp-up to allow all users to establish connections to the database, typically 60 seconds
Time to run in steady state whereby all users start issuing transactions to the database
Think time, which is the time to sleep between individual transactions and is used by each client, typically 100 milliseconds

A number of iGEN-OLTP benchmark runs were performed. Each workload ran for five minutes in steady state with a fixed number of connections for each test. The core metrics measured are transactions per minute (TPM), number of users supported, and average response time. The average response time for any of the transactions must be less than 100 milliseconds for a run to be considered valid.
System Configuration Details

The benchmark was run using Oracle Database 11g Release 2. The hardware and software configuration for both the database server and the load generator are described below:
Database Server: Sun SPARC Enterprise M4000 Server

The database server hardware specifications consisted of the following:

CPU: 4 x SPARC64 VI 2.15-GHz dual-core processor with 2 strands per core
Cache memory:
L1 cache: 128 KB (instruction) / 128 KB (data) per core
L2 cache: 5 MB shared per processor
Memory size: 16384 megabytes
Network: 2 × Gigabit Ethernet ports, 2 × 10/100 Ethernet ports for accessing CLI, and browser-based interface for management
PCI-X and PCI Express (PCIe): 1 × PCI-X slot and 4 × PCIe slots per I/O tray
Disks: 2 internal 73-GB SAS disk drives
One Sun Flash Accelerator F20 PCIe Card with 4 SSDs in a special form factor on board, each with 24 GB of addressable capacity

The following software versions were used:

Operating System: Oracle Solaris 10 09/10
Database software: Oracle 11g Release 2 Enterprise Edition for Solaris Operating System (SPARC) (64-bit)
Database configuration: See appendix A for the Oracle initialization file, init.ora

Oracle Automatic Storage Management was used to create two disk groups with normal redundancy and without any filesystems. One disk group was for data files and was called "DATA"; the other disk group consisted of the flash storage and was called "FLASH."
iGEN-OLTP Load Generator: Sun Fire X4440 Server

The iGEN-OLTP load generator server hardware specifications consisted of the following:

CPU: 4 x AMD Opteron 2.3-GHz quad-core processor
L2 cache memory: 512 KB per processor core
Memory size: 16384 megabytes
Network: Four 10/100/1000 Base-T Ethernet ports and one dedicated 10/100Base-T Ethernet port for management
PCI-X and PCIe: One PCIe x16 slot, four PCIe x 8 slot, and one PCIe x 4 slot
Disks: Eight 2.5" 73-GB SAS internal hot-swap disks drives

The operating system software version used was Oracle Solaris 10 05/08.
Sun SPARC Enterprise M4000 and M5000 Midrange Servers

Oracle’s Sun SPARC Enterprise M4000 and M5000 servers are highly reliable, easy to manage, and vertically scalable systems with many of the benefits of traditional mainframes— without the associated cost or complexity. These midrange enterprise servers were designed to be extremely flexible, scalable, and robust with mainframe-class reliability and availability capabilities. These servers feature a balanced, scalable symmetric multiprocessing (SMP) design that uses the latest generation of SPARC64 processors connected to memory and I/O by a high-speed, low-latency system interconnect that delivers exceptional throughput to applications.

Architected to reduce planned and unplanned downtime, the Sun SPARC Enterprise M4000 and M5000 servers include mainframe-class reliability, availability, and serviceability (RAS) capabilities to avoid outages, reduce recovery time, and improve overall system uptime. These servers can deliver enterprise-class service levels for mission critical workloads, supporting medium to large databases, business processing applications (ERP, SCM, CRM, OLTP), BIDW (database, datamart, DSS), scientific/engineering applications, and consolidation/virtualization projects.
Capabilities Overview

The Sun SPARC Enterprise M4000 server can be configured with up to four dual-core SPARC64 VI processors or four quad-core SPARC64 VII processors, with two simultaneously executing threads per core (each thread is seen as a processor by the operating system) and up to 256-GB error correcting code (ECC) memory in a dense, rack-optimized, six rack-units (RU) system.

The Sun SPARC Enterprise M5000 server offers double the number of cores and memory in a 10 RU system. The SPARC64 processors incorporate the symmetric multiprocessing (SMP) architecture, which allows any CPU to access any memory board on the system regardless of location. These processors also feature advanced multithreading technologies that improve system performance by maximizing processor utilization.

In both servers, a high-performance system backplane interconnects processors and local memory with the I/O subsystem. The system interconnect or bus was designed to minimize latency and provide maximum throughput, regardless of whether the workload is compute, I/O, or memory intensive. Implemented as point-to-point connections that utilize packet-switched technology, this interconnect delivers 32 GB/second of peak bandwidth in the Sun SPARC Enterprise M4000 server and 64 GB/second in the Sun SPARC Enterprise M5000 server.
Dynamic Domains and Dynamic Reconfiguration

The Sun SPARC Enterprise M4000 and M5000 servers can be partitioned into two and four independent Dynamic Domains, respectively. These Dynamic Domains are electrically isolated partitions, each running independent instances of Oracle Solaris. These servers feature advanced resource control supporting the allocation of sub-system board resources, including CPUs, memory, and I/O trays, either in their entirety to one domain or divided logically between domains. Domains are used for server consolidation and to run separate parts of a solution, such as an application server, Web server, and database server. Hardware or software failures in one Dynamic Domain do not affect applications running in other domains.

Dynamic Reconfiguration technology provides added value to Dynamic Domains by providing administrators with the ability to shift computing resources between domains in accordance with changes in the workload without taking the system offline. This technology enhances system availability by allowing administrators to perform maintenance, live upgrades, and physical changes to system hardware resources, while the server continues to execute applications and without the need for system reboots.
Advanced Reliability, Availability, and Serviceability Features

Specifically designed to support complex, network computing solutions and stringent high-availability requirements, the Sun SPARC Enterprise M4000 and M5000 servers include redundant and hot-swap system components, diagnostic and error recovery features throughout their design, and built-in remote management features.

The Sun SPARC Enterprise M4000 and M5000 servers feature important technologies that detect and correct failures early and keep faulty components from causing repeated downtime. This advanced architecture fosters high levels of application availability and rapid recovery from many types of hardware faults, often with no impact to users or system functionality.

The following features work together to raise application availability:

End-to-end data protection detects and corrects errors throughout the system, ensuring complete data integrity. This includes support for error marking, instruction retry, L1 and L2 cache dynamic degradation, up to 128-GB error-correcting code (ECC) protection, total SRAM and register protection, ECC and Extended ECC protection for memory, and optional memory mirroring.
Mainframe-class fault isolation helps the server isolate errors within component boundaries and offline only the relevant chips instead of the entire component. This feature applies to CPUs, memory access controllers, crossbar ASICs, system controllers, and I/O ASICs. For example, persistent CPU soft errors can be resolved by automatically offlining either a thread, core, or entire CPU. Similarly, memory pages can be taken offline proactively in response to multiple corrections for data access for a specific memory DIMM.
Dynamic CPU resource deallocation provides processor fault detection, isolation, and recovery. This feature dynamically reallocates CPU resources to an operational system using Dynamic Reconfiguration without interrupting the applications that are running.
Periodic component status checks are performed to determine the status of many system devices to detect signs of an impending fault. Recovery mechanisms are triggered to prevent system and application failure.

Reliability and Availability Features of Oracle Solaris 10

The ability to rapidly diagnose, isolate, and recover from hardware and application faults is essential to increase reliability and availability of the system. In addition to the error detection and recovery features provided by the hardware, Oracle Solaris 10 takes a big leap forward in self-healing with the introduction of Oracle Solaris Fault Manager and Oracle Solaris Service Manager technology.

Oracle Solaris Fault Manager promotes availability by automatically diagnosing faults in the system and initiating self-healing actions to help prevent service interruptions. The Oracle Solaris Fault Manager diagnosis engine produces a fault diagnosis once discernible patterns are observed from a stream of incoming errors. Following error identification, the Oracle Solaris Fault Manager provides information to agents that know how to respond to specific faults. Problem components can be configured out of a system before a failure occurs—and in the event of a failure, this feature initiates automatic recovery and application re-start. For example, an agent designed to respond to a memory error might determine the memory addresses affected by a specific chip failure and remove the affected locations from the available memory pool.

Oracle Solaris Service Manager converts the core set of services packaged with the operating system into first-class objects that administrators can manipulate with a consistent set of administration commands, including start, stop, restart, enable, disable, view status, and snapshot. Oracle Solaris Service Manager unifies service control by managing the interdependency between services, ensuring that they are started (or restarted following service failure) in the appropriate order. It is integrated with Oracle Solaris Fault Manager and is activated in response to fault detections.

With Oracle Solaris 10, business-critical applications and essential system services can continue uninterrupted in the event of software failures, major hardware component breakdowns, and software misconfiguration problems.
Sun Flash Accelerator F20 PCIe Card

Oracle’s Sun Flash Accelerator F20 PCIe Card is an innovative, low-profile PCIe card that supports onboard, enterprise-quality, solid-state based storage. The Sun Flash Accelerator F20 PCIe Card delivers a tremendous performance boost to applications using flash storage technology—up to 100 K I/O operations per second (IOPS) for random 4-K reads, compared to about 330 IOPS for traditional disk drives—in a compact PCIe form factor. Thus, a single Sun Flash Accelerator F20 PCIe Card delivers about the same number of IOPS as three hundred 15-K RPM disk drives. At the same time, it consumes a fraction of the power and space that those disk drives require. Adding one or more cards to an Oracle rack mounted server turns virtually any Sun x86 or UltraSPARC processor-based system into a high-performance storage server.

The Sun Flash Accelerator F20 PCIe Card, shown in Figure 1, combines four flash modules—known as Disk on Module (DOM) units—each containing 24 GB of enterprise-quality SLC NAND flash and 64 MB of dynamic random access memory (DRAM), for a total of 96 GB flash and 256 MB DRAM per PCIe card. Each card also incorporates a supercapacitor module that provides enough energy to flush DRAM contents to persistent flash storage in the event of a sudden power outage, which helps to enhance data integrity.
Sun Flash Accelerator

Figure 1: Sun Flash Accelerator F20 PCIe Card
Sun Flash Accelerator F20 PCIe Card Highlights

The Sun Flash Accelerator F20 PCIe Card provides these benefits:

Low latency. Flash technology can complete an I/O operation in microseconds, placing it between hard disk drives (HDDs) and DRAM in terms of latency. Because flash technology contains no moving parts, it avoids the long seek times and rotational latencies inherent with traditional HDD technology. As a result, data transfers to and from the onboard flash devices are significantly faster than what electromechanical disk drives can provide. A single Sun Flash Accelerator F20 PCIe Card can provide up to 100 K IOPS for read operations, compared to mere hundreds of IOPS for HDDs.
Enterprise-level reliability. Sun engineers worked closely with NAND manufacturers to make specific reliability enhancements to the flash devices. These enterprise-quality SLC NAND devices exhibit greater endurance than commercially available flash components used in consumer products, such as MP3 players and digital cameras, and they are rated for more than 2 million hours MTBF (mean time between failures), which is greater than most disk drives. The onboard flash devices are managed by a flash memory controller. Each controller provides internal RAID, sophisticated wear leveling, error correction code (ECC), and bad block mapping to provide the highest level of longevity and endurance. Each flash module includes an additional 8 GB (or 25 percent) of reserved internal storage that is used by the controller to replace worn out blocks. In addition, a supercapacitor unit flushes DRAM contents to flash storage if a power loss occurs. Even if a supercapacitor fails, the design maintains data integrity because it automatically enables write-through mode.
Simplified management. The Sun Flash Accelerator F20 PCIe Card presents itself as an HBA to the server, because the four DOMs are treated as four separate 24-GB disks. OS commands that manage disk drives apply equally to the DOM storage modules, so no special device drivers are required, and no re-compilation of applications is necessary. In addition, firmware upgrades for the flash controller can be easily downloaded and applied as needed.
Flexible configurations. The Sun Flash Accelerator F20 PCIe Card can be deployed in virtually any qualified Sun server that accepts a PCIe-based HBA.
Leading eco-responsibility. The solid-state DOMs operate at low power (approximately 2 watts for each 24-GB module), which is especially low compared to disk devices (typically around 12 watts each). The card itself consumes about 16.5 watts during normal operation.

While several other flash-based storage solutions exist today, the Sun Flash Accelerator F20 PCIe Card provides the performance benefit of flash storage in a convenient and compact low-profile PCIe form factor. Occupying a single slot on the motherboard, the card’s dense PCIe form factor is particularly beneficial for existing servers with a limited number of available disk slots or when you do not wish to replace existing disk drives with SSDs. And since it is a PCIe card, the I/O operations do not have to suffer from disk controller limitation.
Test Results

We ran several tests on the Sun SPARC Enterprise M4000 server, each time varying the size of the SGA buffer cache at 10%, 16%, and 20% the size of the database. In each of these tests, the same workload was used in testing with flash and without flash. The size of the flash storage was also varied. A test was considered to be valid only if it completed with an average response time less than 100 milliseconds. The results obtained from the various runs are detailed below.
Results with SGA Buffer Cache Size 10% of Database
Table 2. Results with SGA Buffer Cache Size 10% of Database
SGA buffer cache size = 5 GB NO Flash WITH FLASH 15 GB WITH FLASH 20 GB
Number of users 400 570 575
Maximum qualified throughput: TPM 56659.17 68102.67 73070.83
Avg. response time in msec
(should be <0 .1="" 0.070="" 0.089="" 0.09="" br="" sec="">
Results with Database Smart Flash Cache with 15-GB SGA Size

Using Database Smart Flash Cache with 3 times the original SGA buffer cache size (15 GB) yielded the following improvement over the original run without flash:

42.5% increase in number of users
20% more TPM

Results with Database Smart Flash Cache with 20-GB SGA Size

Using Database Smart Flash Cache with 4 times the original SGA buffer cache size (20 GB) yielded the following improvement over the original run without flash:

43.75% increase in number of users
29% more TPM

These results are displayed in the graphs in Figure 2.

Figure 2: Test Results with 5-GB SGA
Results with SGA Buffer Cache Size 16% of Database
Table 3. Results with SGA Buffer Cache Size 16% of Database
SGA buffer cache size = 8 GB NO Flash WITH FLASH 16 GB WITH FLASH 20 GB
Number of users 480 575 600
Maximum qualified throughput: TPM 64479.50 72437.33 75576.50
Avg. response time in msec (should be <0 .1="" 0.079="" 0.089="" br="" sec="">
Results with Database Smart Flash Cache 16 GB SGA Size

Using Database Smart Flash Cache with 2 times the original SGA buffer cache size (16 GB) yielded the following improvement over the original run without flash:

20% increase in number of users
12.3% more TPM

Results with Database Smart Flash Cache 20 GB SGA Size

Using Database Smart Flash Cache with 2.5 times the original SGA buffer cache size (20 GB) yielded the following improvement over the original run without flash:

25% increase in number of users
17.2% more TPM

These results are displayed in the graphs in Figure 3.
Figure 3

Figure 3: Test Results with 8-GB SGA
Results with SGA Buffer Cache Size 20% of Database
Table 4. Results with SGA Buffer Cache Size 20% of Database
SGA buffer cache size = 10 GB NO Flash WITH FLASH 20 GB WITH FLASH 22 GB
Number of users 575 590 595
Maximum qualified throughput: TPM 75175.83 72571.67 75901.00
Avg. response time in msec
(should be <0 .1="" 0.083="" 0.086="" 0.087="" br="" sec="">
Results with Database Smart Flash Cache 20GB SGA Size

Using Database Smart Flash Cache with two times the original SGA buffer cache size (20 GB) yielded the following improvement over the original run without flash:

2.6% increase in number of users
3.5% less TPM

Results with Database Smart Flash Cache 22GB SGA Size

Using Database Smart Flash Cache with 2.2 times the original SGA buffer cache size (22 GB) yielded the following improvement over the original run without flash:

3.4% increase in number of users
0.9% more TPM

In this case, most of the operations are already cached, and the Database Smart Flash Cache size needs to be at least four times the size of SGA to make a difference.

These results are displayed in the graphs in Figure 4.
Figure 4

Figure 4: Test Results with 10-GB SGA
Conclusion

A key metric for any OLTP database application is the number of transactions that can be executed over a given period of time. In addition to the number of transactions per minute (TPM), it is also imperative that as many users as possible can be served within acceptable response times. Otherwise, organizations would have to deploy many more systems to provide a positive end-user experience.

The results from Oracle’s iGEN-OLTP benchmark tests, which are shown in the tables above, suggest that when the SGA buffer cache size in memory is equal to 10% of the total database size, the system can scale to support 43% more users and 29% greater TPM than on a Sun SPARC Enterprise M4000 server without Database Smart Flash Cache technology. This was achieved by taking advantage of less expensive, reliable, and more power efficient flash-based storage at four times the capacity of the SGA buffer size. These results are equivalent to those obtained when doubling the SGA buffer cache size in memory to 20% of the total database size and without flash-based storage, which is a more expensive solution due to the cost of additional memory and power requirements.

A Sun SPARC Enterprise M5000 server was not available for this test. However, this larger server offers double the number of cores and memory and twice the system bus bandwidth of the Sun SPARC Enterprise M4000 server. Because of the proven scalability of Oracle Solaris 10 and Oracle Database, it is possible to extrapolate that the results on the Sun SPARC Enterprise M5000 server would be just as good, if not better, enabling it to support double the number of users and TPM in each of the tests described above.

Database Smart Flash Cache technology from Oracle thus provides scalability to meet demands placed by ever larger workloads and increasing number of users, delivering breakthrough advantages for application performance. Just by adding the Sun Accelerator F20 PCIe card to an existing server, and without downloading any special driver or re-compiling any applications, an existing setup can be scaled to support many more users, handle many more transactions, accelerate application performance, increase business productivity, improve ROI, and enhance the end-users’ experience.

By using the Sun SPARC Enterprise M4000 and M5000 servers running Oracle Solaris 10 with the Sun Accelerator F20 PCIe card and Oracle Database 11g Release 2, an intelligent database that knows how to efficiently take advantage of flash-based storage, you can experience significant breakthroughs in performance and business agility. This computing environment also supports high service levels with mainframe-class reliability, availability, and serviceability features the in Sun SPARC Enterprise M4000 and M5000 servers as well as the highly reliable Sun Flash Modules in the Sun Accelerator F20 PCIe card that have been tested and certified for 2 million hours MTBFs. These innovations from Oracle bring enterprise computing even closer to the ideal of complete automation in the data center.
References

For more information, visit the Web resources listed in Table 5.
Table 5. Web resources for further information
Web Resource Description Web Resource URL
Sun Flash Accelerator F20 PCIe Card www.oracle.com/us/products/servers-storage/storage/disk-storage/043966.html
Sun SPARC Enterprise M4000 Server www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/m-series/031646.htm
Sun SPARC Enterprise M5000 Server www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/m-series/031732.htm
Oracle Solaris www.oracle.com/solaris

Appendix: Oracle Initialization File init.ora

############################################################
# Copyright (c) 1991, 2001, 2002 by Oracle Corporation
############################################################
_array_update_vector_read_enabled = TRUE
parallel_max_servers = 64
parallel_min_servers = 0
db_writer_processes = 3
_imu_pools =0
_in_memory_undo =FALSE
_smm_advice_enabled =FALSE
_undo_autotune =FALSE
thread = 1
db_block_checksum = false
db_cache_size = 5000m
db_file_multiblock_read_count = 128
db_files = 1023
dml_locks = 8000
global_names = FALSE
java_pool_size = 20m
job_queue_processes = 4
log_buffer = 4194304
log_checkpoints_to_alert = TRUE
nls_date_format = DD-MON-RR
nls_numeric_characters = ".,"
nls_sort = binary
nls_language = american
nls_territory = america
replication_dependency_tracking = FALSE
shared_pool_size = 1200m
shared_pool_reserved_size = 150m
cursor_space_for_time = FALSE
sort_area_size = 512000
sort_area_retained_size = 512000
undo_retention = 30
_in_memory_undo=false
undo_management = AUTO
filesystemio_options = setall
_library_cache_advice = FALSE
_smm_advice_enabled = FALSE
db_cache_advice = OFF
_db_mttr_advice = OFF
timed_statistics = TRUE
_trace_files_public=true
cursor_space_for_time = TRUE
transactions_per_rollback_segment = 1
session_cached_cursors = 200
cursor_sharing = similar
_db_block_hash_latches = 65536

###########################################
# Smart Flash Cache fields
###########################################
db_flash_cache_file="+FLASH/test"
db_flash_cache_size=20G

###########################################
# Cache and I/O
###########################################
db_block_size=8192

###########################################
# Cursors and Library Cache
###########################################
open_cursors=3024
###########################################
# Database Identification
###########################################
db_domain=""
db_name=wcb

###########################################
# File Configuration
###########################################
db_recovery_file_dest=+RECOVERY
db_recovery_file_dest_size=4070572032

###########################################
# Miscellaneous
###########################################
compatible=11.2.0.0.0
diagnostic_dest=/u01/app/oracle

###########################################
# Processes and Sessions
###########################################
processes=2200

###########################################
# Security and Auditing
###########################################
audit_file_dest=/u01/app/oracle/admin/wcb-sav/adump
audit_trail=db
remote_login_passwordfile=EXCLUSIVE
###########################################
# Shared Server
###########################################
dispatchers="(PROTOCOL=TCP) (SERVICE=wcbXDB)

Revision 1.0, 04/15/2011