What is RAC?
RAC stands for Real Application cluster. It is a
clustering solution from Oracle Corporation that ensures high availability of
databases by providing instance failover, media failover features.
How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100
instances in a RAC database.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name
for clients to access an Oracle Database running in a cluster. The benefit is
clients using SCAN do not need to change if you add or remove nodes in the
cluster.
Click
here for more details from Oracle
Mention the Oracle RAC software components:-
· An Environment that supports of two or more
database instances is an RAC.
· They are composed of Memory structures and
background processes.
· Oracle RAC instances use two processes GES (Global Enqueue Service), GCS (Global Cache Service) that enable cache fusion.
· Oracle RAC instances are composed of following background
processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
What is GRD?
GRD stands for Global Resource
Directory. The GES and GCS maintain records of the status of each datafile
and each cached block using global resource directory. This process is referred
to as cache fusion and helps in data integrity.
Cache Fusion in Detail:-
Oracle RAC is composed of two or more instances. When a block
of data is read from datafile by an instance within the cluster and another
instance is in need of the same block, it is easy to get the block image from
the instance which has the block in its SGA rather than reading from the disk. To enable inter
instance communication Oracle RAC makes
use of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process manages the
cache fusion.
What are Oracle database background processes specific
to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC
database instance obtains the block that it needs to satisfy a query or
transaction, Oracle RAC instances use two processes, the Global Cache
Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file
and each cached block using a Global Resource Directory (GRD). The GRD contents
are distributed across all of the active instances.
RAC Background Processes in Detail.
ACMS in Detail:-
ACMS stands for Atomic Controlfile
Memory Service. In an Oracle RAC
environment ACMS is an agent that ensures a distributed SGA memory update (ie) SGA updates are globally committed on success or
globally aborted in event of a failure.
GTX0-j in Detail:-
The process provides transparent
support for XA global transactions in a RAC environment. The database auto tunes the number of
these processes based on the workload of XA global transactions.
LMON in Detail:-
This
process monitors global enques and resources across the cluster and performs Global Enqueue recovery operations. This is called as Global Enqueue Service
Monitor.
LMD in Detail:-
This
process is called as global enqueue service daemon. This process manages
incoming remote resource requests within each instance.
LMS in Detail:-
This
process is called as Global Cache service process. This process maintains
status of datafiles and each cached block by recording information in a Global
Resource Directory (GRD). This process also controls the flow of messages to
remote instances and manages global data block access and transmits block
images between the buffer caches of different instances. This processing is a
part of cache fusion feature.
LCK0 in Detail:-
This
process is called as Instance enqueue process. This process manages non-cache
fusion resource requests such as library and row cache requests.
RMSn in Detail:-
This
process is called as Oracle RAC
management process. These processes perform manageability tasks for Oracle RAC. Tasks include creation of resources related
Oracle RAC when new instances are added to the cluster.
RSMN in Detail:-
This
process is called as Remote Slave Monitor. This process manages background
slave process creation and communication on remote instances. This is a
background slave process. This process performs tasks on behalf of a coordinating
process running in another instance.
What are Oracle Clusterware processes for 10g on Unix
and Linux
Cluster Synchronization
Services (ocssd) —
Manages cluster node membership and runs as the oracle user; failure of this
process results in cluster restart.
Cluster Ready Services
(crsd) — The crs process manages cluster resources (which could be
a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based
on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover
operations. This process runs as the root user
Event manager daemon (evmd)
—A background process that publishes events that crs creates.
Process Monitor Daemon
(OPROCD) —This process monitor the cluster and provide I/O fencing.
OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD
failure results in Oracle Clusterware restarting the node. OPROCD uses the
hangcheck timer on Linux platforms.
RACG (racgmain, racgimon)
—Extends clusterware to support Oracle-specific requirements and complex
resources. Runs server callout scripts when FAN events occur.
What are Oracle Clusterware Components
Voting Disk —
Oracle RAC uses the voting disk to manage cluster membership
by way of a health check and arbitrates cluster ownership among the instances
in case of network failures. The voting disk must reside on shared disk.
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as
configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by
all of the nodes in your cluster
What components in RAC must reside in shared storage?
All datafiles, controlfiles, SPFIles,
redo log files must reside on cluster-aware shared storage.
What is the
significance of using cluster-aware shared storage in an Oracle RAC environment?
All instances of an Oracle RAC can access all the datafiles, controlfiles,
SPFILE's, redolog files when these files are hosted out of cluster-aware shared
storage which are group of shared disks.
Give few examples for solutions that support cluster
storage:-
· ASM (automatic storage management),
· raw disk devices,
· network file system (NFS),
· OCFS2 and
· OCFS (Oracle Cluster Fie systems).
What is an interconnect network?
An interconnect network is a private
network that connects all of the servers in a cluster. The interconnect network
uses a switch/multiple switches that only the nodes in the cluster can access.
How can we configure the cluster interconnect?
· Configure User Datagram Protocol (UDP) on Gigabit
Ethernet for cluster interconnects.
· On UNIX and Linux systems we use UDP and RDS (Reliable
data socket) protocols to be used by Oracle Clusterware.
· Windows clusters use the TCP protocol.
Can we use crossover cables with Oracle Clusterware
interconnect?
No, crossover cables are not supported
with Oracle Clusterware interconnects.
What is the use of cluster interconnect?
Cluster interconnect is used by the
Cache fusion for inter instance communication.
what is the purpose of Private Interconnect?
Clusterware uses the private interconnect for cluster synchronization (network
heartbeat) and daemon communication between the the clustered nodes. This
communication is based on the TCP
protocol.
RAC uses the interconnect for cache fusion (UDP) and
inter-process communication (TCP).
Cache Fusion is the remote memory mapping of Oracle buffers, shared between the
caches of participating nodes in the cluster.
How do users connect to
database in an Oracle RAC environment?
Users can access a RAC database using a client/server configuration or
through one or more middle tiers, with or without connection pooling. Users can
use oracle services feature to connect to database.
What is the use of a service in Oracle RAC environment?
Applications should use the services
feature to connect to the Oracle database. Services enable us to define rules
and characteristics to control how users and applications connect to database
instances.
What are the characteristics controlled by Oracle
services feature?
The characteristics include a unique
name, workload balancing, failover options, and high availability.
Which enables the load balancing of applications in
RAC?
Oracle Net Services enable the load
balancing of application connections across all of the instances in an Oracle RAC database.
What is a virtual IP address or VIP?
A virtual IP address or VIP is an alternate IP address that the client connections
use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for
each node, and the IP addresses must use the same subnet as the public network.
What is the use of VIP?
If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle
connections.
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often
wait for a TCP timeout period (which can be up to 10 min) before
getting an error. As a result, you don't really have a good HA solution without
using VIPs.
When a node fails, the VIP associated with it is automatically failed over to
some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the
VIP go to the new node, which will send error RST packets back to the clients. This results in the
clients getting errors immediately.
Give situations under which VIP address failover happens:-
VIP addresses failover happens when the node on which
the VIP address runs fails; all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.
What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to
connect to the VIP address receive a rapid connection refused error
.They don't have to wait for TCP
connection timeout messages.
What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image
using the below
· OEM (Enterprise Manager),
· SQL*PLUS,
· Server control (SRVCTL),
· Cluster Verification Utility (CLUVFY),
· DBCA,
· NETCA
How do we verify that RAC instances are running?
Issue the following query from any one
node connecting through SQL*PLUS.
$connect sys/sys as sysdba
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column, host instance name
under INST_NAME column.
What is FAN?
Fast application Notification as it
abbreviates to FAN relates to the events related to instances, services and nodes.
This is a notification mechanism that Oracle RAC uses to notify other processes about the
configuration and service level information that includes service status
changes such as, UP or DOWN events. Applications can respond to FAN events and
take immediate action.
Where can we apply FAN UP and DOWN events?
FAN UP and FAN DOWN events can be
applied to instances, services and nodes.
State the use of FAN events in case of a cluster
configuration change?
During times of cluster configuration
changes, Oracle RAC high availability framework publishes a FAN event
immediately when a state change occurs in the cluster. So applications can
receive FAN events and react immediately. This prevents applications from
polling database and detecting a problem after such a state change.
Why should we have separate homes for ASM instance?
It is a good practice to have ASM home
separate from the database home (ORACLE_HOME). This helps in upgrading and
patching ASM and the Oracle database software independent of each other. Also,
we can deinstall the Oracle database software independent of the ASM instance.
What is the advantage of using ASM?
Having ASM is the Oracle recommended
storage option for RAC databases as the ASM maximizes performance by
managing the storage configuration across the disks. ASM does this by
distributing the database file across all of the available storage within our
cluster database environment.
What is rolling upgrade?
It is a new ASM feature from Database
11g. ASM instances in Oracle database 11g release(from 11.1) can be upgraded or
patched using rolling upgrade feature. This enables us to patch or upgrade ASM
nodes in a clustered environment without affecting database availability.
During a rolling upgrade we can maintain a functional cluster while one or more
of the nodes in the cluster are running in different software versions.
Can rolling upgrade be used to upgrade from 10g to 11g
database?
No, it can be used only for Oracle
database 11g releases (from 11.1).
State the initialization parameters that must have same
value for every instance in an Oracle RAC database:-
Some initialization parameters are
critical at the database creation time and must have same values. Their value
must be specified in SPFILE or PFILE for every instance. The list of parameters
that must be identical on every instance are given below:
· ACTIVE_INSTANCE_COUNT
· ARCHIVE_LAG_TARGET
· COMPATIBLE
· CLUSTER_DATABASE
· CLUSTER_DATABASE_INSTANCE
· CONTROL_FILES
· DB_BLOCK_SIZE
· DB_DOMAIN
· DB_FILES
· DB_NAME
· DB_RECOVERY_FILE_DEST
· DB_RECOVERY_FILE_DEST_SIZE
· DB_UNIQUE_NAME
· INSTANCE_TYPE (RDBMS or ASM)
· PARALLEL_MAX_SERVERS
· REMOTE_LOGIN_PASSWORD_FILE
· UNDO_MANAGEMENT
Can the DML_LOCKS and RESULT_CACHE_MAX_SIZE be identical on all instances?
These parameters can be identical on
all instances only if these parameter values are set to zero.
What two parameters must be set at the time of starting
up an ASM instance in a RAC environment?
The
parameters CLUSTER_DATABASE and INSTANCE_TYPE must be set.
Mention the components of Oracle Clusterware:-
Oracle Clusterware is made up of
components like voting disk and Oracle Cluster Registry (OCR).
What is a CRS resource?
Oracle Clusterware is used to manage
high-availability operations in a cluster. Anything that Oracle Clusterware
manages is known as a CRS resource. Some examples of CRS resources are database, an instance, a service, a
listener, a VIP address, an application process etc.
What is the use of OCR?
Oracle Clusterware manages CRS resources based on the configuration information
of CRS resources stored in OCR (Oracle Cluster Registry).
How does an Oracle Clusterware manage CRS resources?
Oracle Clusterware manages CRS resources based on the configuration information
of CRS resources stored in OCR (Oracle Cluster Registry).
Name some Oracle Clusterware tools and their uses?
· OIFCFG - allocating and deallocating network
interfaces.
· OCRCONFIG - Command-line tool for managing Oracle
Cluster Registry.
· OCRDUMP - Identify the interconnect being used.
· CVU - Cluster verification utility to get status of CRS resources
What are the modes of deleting instances from Oracle
Real Application cluster Databases?
We can delete instances using silent
mode or interactive mode using DBCA (Database Configuration Assistant).
How do we remove ASM from an Oracle RAC environment?
We need to stop and delete the
instance in the node first in interactive or silent mode. After that ASM can be
removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name
How do we verify that an instance has been removed from OCR after deleting an instance?
Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat
How do we verify an existing current backup of OCR?
We can verify the current backup of OCR using the following command : ocrconfig
-showbackup
What are the performance views in an Oracle RAC environment?
We have v$ views that are instance
specific. In addition we have GV$ views called as global views that has an
INST_ID column of numeric data type.GV$ views obtain information from individual
V$ views.
What are the types of connection load-balancing?
There are two types of connection
load-balancing: server-side load balancing and client-side load balancing.
What is the difference between server-side and
client-side connection load balancing?
Client-side balancing happens at
client side where load balancing is done using listener. In case of server-side
load balancing listener uses a load-balancing advisory to redirect connections
to the instance providing best service.
Give the usage of srvctl:-
· srvctl start instance -d db_name -i
"inst_name_list" [-o
start_options]
· srvctl stop instance -d name -i
"inst_name_list" [-o stop_options]
· srvctl stop instance -d orcl -i
"orcl3,orcl4" -o immediate
· srvctl start database -d name [-o start_options]
· srvctl stop database -d name [-o stop_options]
· srvctl start database -d orcl -o mount
How
do you troubleshoot node reboot
Please check metalink ...
Note 265769.1 Troubleshooting CRS
Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for
diagnosing Oracle Clusterware Node evictions.
How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the
export command:
#ocrconfig -export -s online, and use -import option to restore the contents
back.
With Oracle RAC 11g Release 1, you can do a manual backup of the OCR with the command:
# ocrconfig -manual backup
How do you backup
voting disk
#dd if=voting_disk_name of=backup_file_name
How do I identify the
voting disk location
#crsctl query css votedisk
How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
Is ssh required for
normal Oracle RAC operation ?
"ssh" are not required for normal Oracle RAC operation. However "ssh" should be
enabled for Oracle RAC and patchset installation.
What do you do if you
see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies
failed" , Work with your system administrator find the fault with network.
What is the purpose of
the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one
ons daemon started per clustered node.
The Oracle Notification Service daemon receives a subset of published
clusterware events via the local evmd and racgimon Clusterware daemons and
forward those events to application subscribers and to the local listeners.
This in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to
respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing across
different RAC nodes dependent of the load on the different nodes. The rdbms
MMON is creating an advisory for distribution of work every 30seconds and
forward it via racgimon and ONS to listeners and applications.
Srvctl
cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it
on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with
srvctl. Now you will get detailed error stack.
What
is (use of) Virtual IP (VIP) in Oracle Real Application
Clusters (RAC)?
When
installing Oracle 10g/11g R1 RAC, three network interfaces (IPs) are
required for each node in the RAC cluster, they are:
- Public Interface: Used for normal
network communications to the node
- Private Interface: Used as the cluster
interconnect
- Virtual (Public) Interface: Used for failover and RAC management
When
installing Oracle 11g R2 RAC, we need one more network interface (IP) is
required for each node in the RAC
cluster.
- SCAN Interface (IP): Single Client Access
Name (SCAN) is a new Oracle Real Application Clusters (RAC) 11g Release 2 feature,
which provides a single name for clients to access an Oracle Database
running in a cluster. The benefit is clients using SCAN do not need to
change if you add or remove nodes in the cluster.
When a client connects to
a tns-alias, it uses a TCP connection to an IP address, defined in the
tnsnames.ora file. When using RAC, we
define multiple addresses in our tns-alias, to be able to failover when an IP
address, listener or instance is unavailable. TCP timeouts can differ from platform to platform or
implementation to implementation. This makes it difficult to predict the
failover time.
Oracle
10g Cluster Ready Services enables databases to use a Virtual IP address to
configure the listener ON. This feature is to assure that oracle clients
quickly failover when a node fails. In
Oracle Database 10g RAC, the use of a virtual IP address to mask the
individual IPO addresses of the clustered nodes is required. The virtual IP
addresses are used to simplify failover and are automatically managed by CRS.
To
create a Virtual IP (VIP) address, the Virtual IP Configuration Assistant
(VIPCA) is called from the root.sh script of a RAC install, which then configures the virtual IP
addresses for each node specified during the installation process. In order to
be able to run VIPCA, there must be unused public IP addresses available for
each node that has been configured in the /etc/hosts file.
One
public IP address for each node to use for its Virtual IP address for client
connections and for connection failover. This IP address is in addition to the
operating system managed public host IP address that is already assigned to the
node by the operating system. This public Virtual IP must be associated with
the same interface name on every node that is a part of the cluster. The IP
addresses that are used for all of the nodes that are part of a cluster must be
from the same subnet. The host names for the VIP addresses must be registered with the domain name
server (DNS). The Virtual IP address should not be in use at the time of the
installation because this is a Virtual IP address that Oracle manages
internally to the RAC processes. This virtual IP address does not
require a separate NIC. The VIPs should be registered in the
DNS. The VIP addresses must be on the same subnet as the public
host network addresses. Each Virtual IP (VIP) configured requires an unused and
resolvable IP address.
Using
virtual IP we can save our TCP/IP
timeout problem because Oracle notification service (ONS) maintains
communication between each nodes and listeners. Once ONS found any listener
down or node down, it will notify another nodes and listeners. While new
connection is trying to establish connection to failure node or listener,
virtual IP of failure node automatically divert to surviving node and session
will be establishing in another surviving node. This process doesn't wait for TCP/IP timeout event. Due to this new connection gets
faster session establishment to another surviving nodes/listener.
Virtual IP (VIP) is for fast connection establishment in failover
dictation. Still we can use physical IP address in Oracle 10g in listener if we
have no worry for failover timing. We can change default TCP/IP timeout using operating system
utilities/commands and kept smaller. But taking advantage of VIP (Virtual IP address) in Oracle 10g RAC database is advisable.
**************************************************************************************