Saturday, May 25, 2019

I/O Issues between DB and Storage tiers in Exadata ?

How storage servers detect and cancel or repair slow I/Os and hung I/Os and confine sick disks..

IOs are pumping between Database and Storage Tiers from time to time. 
Let's see what are the different problems can be handled at storage tier.

1. Slow IO ?    ->  Cell IO Latency Capping

  What happens if we hit with slow I/Os in the storage tier, something called cell I/O latency issues? Well Exadata has a feature called Cell IO Latency Capping, which monitors I/O timings and if any disk is taking too long, it will direct read to a mirror and write to an alternate healthy disk.

2. Hung IO ?      ->  IO Hang detection

  It can be really bad if you face with truly hung I/O that escalates all the way up to like a controller level problem, you can stall your entire system with this hung I/O.. IO Hang detection will help with detection and repair and may even reset a whole cell if the problem is bad to make sure system won't stop.

3. Sick disk?     -> Predictive failure / confinement

If you have a situation where the disk about to die and I/O service timings are really bad..
Predictive failure feature built in the controllers which has heuristics to tell when a disk is going to fail and it will put in a predictive failure mode. This feature monitors metrics of disks and flash are being serviced across all different components. If they aren't then it potentially offline the sick disk.


What happens if there is undiscovered hardware or software issue on the storage tier, probably a bug or a network glitch on InfiniBand network connecting to cells or so..

4. Undiscovered hardware / Software issue?  -> Database tier I/O latency capping

From database tier, it monitors how long I/Os are taking. If there is a problem detected it will cancel them and redirect to a healthy cell.




No comments:

Post a Comment