Jul 17

Jul 17 Oracle Performance: Measuring RAC Cache Fusion Internode Time

The Overhead of Running RAC

Each RAC cluster relies on a fast private interconnect amongst the nodes in the cluster. Blocks needed by one node can quickly be sent from a node already having that block cached. This is called "Cache Fusion."

There IS a cost to sending these blocks around the nodes; the transmission is not instantaneous, and in some cases can actually become a bottleneck. Of course, the modern Cache Fusion is FAR faster than the old days of OPS, where a block had to be written to disk by one node, then read by another node. That "ping" could easily cause a 10 ms delay just for one block. Well, we are much better now!

If you check the AWR report for a node on your cluster, you can see the sql that are slowed by cluster time. On the large systems I have analyzed, some sql are slowed by 10% or more due to these delays.

It is not unusual to blame "the network" for RAC performance issues. The only problem with that idea is that it's often tough to prove. So, how does one figure out how fast your Cache Fusion really is?

An Easy Way to Measure Cache Fusion

Here is an easy way to check the RAC internode time. One of the quickest events that Oracle uses to communicate is called the "2-way gc grant." It's normally very fast (typically 1 ms or less.) This is similar to a fast network "ping."

Here's the key point: Just think of what would happen if the time to send a block in the cluster took much longer than 1 ms. If that time doubled, for instance, your application could be seriously degraded.

We can get an historical chart, sorted by snapshot, of this fast "grant" event. In this way, you can see if RAC has been having trouble communicating amongst the nodes.

WITH BASE AS (SELECT instance_number, SNAP_ID, TOTAL_WAITS, time_waited_micro/1000 timemsec,
LAG(time_waited_micro/1000, 1) OVER (ORDER BY snap_id) AS PREV_TIME_MSEC,
LAG(total_waits, 1) OVER (ORDER BY snap_id) AS PREV_waits
FROM dba_hist_system_event
WHERE event_name ='gc cr grant 2-way'
and instance_number = 1
and snap_id between tbd and tbd
)
SELECT b.SNAP_ID, b.instance_number NODE,
to_char(begin_interval_time, 'dd-mon-yy-hh24:mi') BEG,
(TOTAL_WAITS-PREV_WAITS) "#WAITS",
ROUND((TIMEMSEC-PREV_TIME_MSEC)/(.001+TOTAL_WAITS-PREV_WAITS), 1) "RATE" FROM BASE b,
dba_hist_snapshot S
where b.instance_number = s.instance_number
andb.snap_id = s.snap_id
and (total_waits-prev_waits) > 99900
ORDER BY 1
/

In the above script, I use an analytical function, "Lag" to find the difference shown in 2 rows of the table.

Expected Output

On most systems I analyze, the internode time is 1 ms or less. In the output below, you can see that the internode time is rock-steady at just .3 ms. In my experience, that is about the best possible.

SNAP_ID       NODE BEG                 #WAITS       RATE
--------- ---------- --------------- ---------- ----------
    17236          7 13-apr-09-05:00    1942375         .3
    17237          7 13-apr-09-06:00    1913682         .3
    17238          7 13-apr-09-07:00    3763238         .3
    17239          7 13-apr-09-08:00    2360403         .4
    17240          7 13-apr-09-09:00    1694804         .3
    17241          7 13-apr-09-10:00    1564779         .3
    17242          7 13-apr-09-11:00     551387         .3

On a well-designed system, the interconnect rate doesn't change much. I typically see a few spikes to about 1.5 ms, but that's about it.

In the script above, be sure to put in your own snapshot_id's. Also, you may want to check the internode performance among all the nodes--not just node 1, as shown in the script above.