Exa Stuff: June 2018

Thursday, June 28, 2018

Physical disk in failed state but online on MegaRaid on CELL

BUG ID: 25632147

Workaround:

1. Copy the cell disk config.xml file (Take backup)

[root@jfclcx0024 config]# cp cell_disk_config.xml* /tmp
[root@jfclcx0024 config]# ls /tmp/cell_disk_config.xml*
/tmp/cell_disk_config.xml /tmp/cell_disk_config.xml_
/tmp/cell_disk_config.xml__

2. Remove the config file

[root@jfclcx0024 config]# rm cell_disk_config.xml*
rm: remove regular file `cell_disk_config.xml'? y
rm: remove regular file `cell_disk_config.xml_'? y
rm: remove regular file `cell_disk_config.xml__'? y

3. Stop the celld services.

[root@jfclcx0024 config]# service celld stop
Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.

4. Start the cell services

[root@jfclcx0024 config]# service celld start
1: 108 usec
1: 76 usec
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services... running
Starting CELLSRV services...
The STARTUP of CELLSRV services was not successful.
CELL-01537: Unable to read the cell_disk_config.xml file because the file is missing or empty.
Starting MS services...
The STARTUP of MS services was successful.

5. Check the status of the cell services, cellsrv will not be started

[root@jfclcx0024 config]# service celld status
rsStatus: running
msStatus: running
cellsrvStatus: stopped
[root@jfclcx0024 config]#
[root@jfclcx0024 config]#
6. copy the backup config.xml file to its original location.

[root@jfclcx0024 config]# cp /tmp/cell_disk_config.xml .

7. Restart the cell services.

[root@jfclcx0024 config]# service celld restart
Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services... running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was successful.
8. Confirm the cell services are up
[root@jfclcx0024 config]# service celld status
rsStatus: running
msStatus: running
cellsrvStatus: running
[root@jfclcx0024 config]#

9. Now check the status of the physical disk, if the status is normal.

[root@jfclcx0024 config]# cellcli -e list physicaldisk 8:10 detail
name: 8:10
deviceId: 25
deviceName: /dev/sdk
diskType: HardDisk
enclosureDeviceId: 8
errOtherCount: 0
luns: 0_10
makeModel: "HGST H7280A520SUN8.0T"
physicalFirmware: P9E2
physicalInsertTime: 2017-02-26T07:23:06+00:00
physicalInterface: sas
physicalSerial: P1PMBV
physicalSize: 7.153663907200098T
slotNumber: 10
status: normal

Wednesday, June 27, 2018

CLSU-00107: operating system function: open failed; failed with error data: 2; at location: SlfFopen1

CLSU-00101: operating system error message: No such file or directory

When executing the asmcmd command in GI,one got the error Can't open '/opt/oracle/log/diag/asmcmd/user_grid/weasel1xa.rjf.com/alert/alert.log' for append.
This is pointing to the Oracle base and not the Oracle home for directory location.

[grid@weasel1xa ~]$ export DBI_TRACE=1
[grid@weasel1xa ~]$ asmcmd
DBI 1.616-ithread default trace level set to 0x0/1 (pid 2193 pi 7fa010) at DBI.pm line 278 via asmcmdshare.pm line 270
Can not create path /opt/oracle/log/diag/asmcmd/user_grid/weasel1xa.rjf.com/alert
Can not create path /opt/oracle/log/diag/asmcmd/user_grid/weasel1xa.rjf.com/trace
Can't open '/opt/oracle/log/diag/asmcmd/user_grid/weasel1xa.rjf.com/alert/alert.log' for append
CLSU-00100: Operating System function: open failed failed with error data: 2
CLSU-00101: Operating System error message: No such file or directory
CLSU-00103: error location: SlfFopen1

As per documentation:

Under certain circumstances, $ORACLE_BASE and $ORACLE_HOME can be set to override the default locations of the alert.log and trace.trc files.
http://docs.oracle.com/cd/E11882_01/server.112/e18951/asm_util001.htm#OSTMG94362

"log" directory was missing in asmcmd log location path /opt/oracle/log/diag/asmcmd ($ORACLE_BASE/log/diag/asmcmd)

$ls -l /opt/oracle
total 48
drwxr-x--- 4 grid oinstall 4096 Oct 31 11:27 admin
drwxr-x--- 2 grid oinstall 4096 Oct 31 11:27 audit
drwxr-x--- 6 grid oinstall 4096 Oct 31 11:29 cfgtoollogs
drwxr-xr-x 2 grid oinstall 4096 Oct 31 11:32 checkpoints
drwxr-xr-x 3 oracle oinstall 4096 Nov 4 16:48 core
drwxrwxr-x 3 grid oinstall 4096 Oct 30 17:54 crsdata
drwxrwxr-x 12 grid oinstall 4096 Nov 1 16:26 diag
drwxr-xr-x 3 oracle oinstall 4096 Oct 31 13:44 opatchauto
drwxr-xr-x 10 oracle oinstall 4096 Oct 31 16:17 ora_sw
drwxr-xr-x 5 oracle oinstall 4096 Nov 5 14:10 product
drwxr-xr-x 3 grid oinstall 4096 Oct 24 14:14 weasel1xa
drwxr-xr-x 3 root root 4096 Oct 31 13:41 weasel1xa.rjf.com

Solution: Issue resolved after creating the directory structure /opt/oracle/log/diag/asmcmd where "log" subdirectory was included.

Tuesday, June 19, 2018

Hybrid Storage Pools

Using Hybrid Storage Pools:

Most of the storage available on the market today utilizes a small amount of NRAM devices (usually 1, 8 or even 16GB in size) as the first cache tier of the system and also hard disk drivers. NRAM devices are expensive, and disk drivers’ performance is affected by seek operations, rotation and transfer times, which can result in I/O bottleneck and performance problems. Oracle ZFS Storage Appliance implements a Hybrid Storage Pool architecture designed to work with multiple tiers of storage media to maximize the performance for the virtualized environment.

First Tier: DRAM (large L1 cache) – DRAM memory and high-optimized and low-latency solid state disks combined with ZFS file systems architecture (ReadZilla) accelerate read-cache operations for the virtualized environment. Unlike traditional NAS architecture, Oracle ZFS Storage Appliance and its Hybrid Storage Pools utilize DRAM devices as the main cache device of the system. DRAM devices are cheaper, faster and deliver higher performance than NVRAM, so they are well matched for random I/O workloads – the kind of workloads performed by hypervisors/virtualized environments. DRAM devices are also used by the Adaptive Replacement Cache (ARC), which is part of the ZFS file system architecture model and intelligently managed by instructions provided by multiple cache algorithms.

Second Tier: SSDs (large L2 cache) – High-optimized and low-latency solid state disks combined with ZFS file systems log architecture (ZIL or LogZilla) to accelerate write-cache operations. These provide excellent performance and fast response for writing operations performed by applications and databases running in virtualized environment. The Hybrid Storage Pools are SSD devices designed to provide fast writing operations (100 times faster than traditional disk drivers) with low latency. Inside of the Hybrid Storage Pool architecture, SSD devices host the ZFS ZIL log (known as LogZilla or ZFS Intent Log), which is part of the ZFS file system architecture and mainly responsible for accelerating the synchronous writing operations requested by the critical applications and databases running in virtualized environments. Also, the SSDs are utilized by the Layer 2 Adaptive Replacement Cache (L2ARC), which is an extension of the ARC (main cache of the system) and hosts the read log devices for the ZFS architecture.

Third Tier: Disk Pools – Disk pools are composed of high-performance (15000 rpm) and/or high capacity (7200 rpm) disks that are protected by different RAID levels and intelligently managed by the ZFS file system. Disk pools are designed to archive the application data, providing continuously high I/O rates for different types of workloads, even when utilizing high-capacity disks (7200 rpm). Disk pools can optionally be configured with 15000 rpm disks, which provide the highest performance and thousands of IOPS for datastores typical of virtualized environments.

Wednesday, June 6, 2018

Software RAID with NVME devices

Assuming there is no data on the devices ( if there is data, DO NOT run these commands, data will be lost! ), lets wipe the existing superblocks

# mdadm --zero-superblock /dev/nvme1n1
# mdadm --zero-superblock /dev/nvme2n1

if that fails with the error below or similar

mdadm: /dev/nvme1n1 appears to be part of a raid array:
level=raid0 devices=0 ctime=Fri Jun 1 07:00:00 2018

clean up the first 1GB from that disk

# dd if=/dev/zero of=/dev/nvme0n1 bs=1M count=1000
# dd if=/dev/zero of=/dev/nvme1n1 bs=1M count=1000
And revert mdadm.conf back to its defaults

# echo "DEVICE /no/device" > /etc/mdadm.conf
Ensure the nvme devices are still blacklisted in /etc/multipath.conf if you are using multipath.

blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|nbd)[0-9]*"
devnode "^hd[a-z][0-9]*"
devnode "^etherd"
devnode "^nvme.*"
wwid 3600605b00d9c63204000540309e0e2c9150

}

that should prevent device-mapper-multipath from using the nvme disks
Reboot
Assemble the software raid after the reboot

# mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1
Add the nvme devices to mdadm.conf

# echo "DEVICE /dev/nvme1n1" > /etc/mdadm.conf
# echo "DEVICE /dev/nvme2n1" >> /etc/mdadm.conf
And add the md0 config to mdadm.conf

# mdadm --detail --scan >> /etc/mdadm.conf
Remove the DEVICE entries from /etc/mdadm.conf afterwards, so that it only contains the ARRAY line

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=78d45ac:0434da03:ce4a9b23:c946bb65
Reboot

The UUID displayed for the same disk in blkid command and mdadm.conf file differ of value:

$ blkid |grep -i md5
/dev/md5: UUID="b26359eb-51ce-42b2-9ec4-ed39b4fd9e2f" TYPE="ext3"

$ cat /etc/mdadm.conf |grep -i md5
ARRAY /dev/md5 level=raid1 num-devices=2 uuid=21b778e3:e267c7dd:c96f4304:db5d882f

UUID mentioned in blkid is an identity for filesystem on the md block device while the one in /etc/mdadm.conf identify the md block device itself.

UUID of blkid is stored in filesystem structure (consistent with stab) and it helps in uniquely identifying the filesystem among the available filesystems on the system, while uuid of mdadm.conf resides in the device metadata and helps the md subsystem identify that particular RAID device uniquely.

In particular, it helps identify all the block devices that belong to the RAID array. Difference of UUID also exists for raid array, device, partition, LVM, PV, VG etc.

Cache Vault versus Disk controller Batteries

Older generations of RAID controllers used lithium-ion based battery backup units to keep the data resonate in the cache memory until power could be restored.

Data was only available for up to 72 hours. Over the life of a controller, the battery will need to be replaced numerous times, as it is only good for about 1 ½ years, which increases cost of ownership.

Unlike supercapacitors, batteries cannot sit on the shelf for a long period of time without requiring re-charging, making inventory management costly.

Further, shipping and disposing lithium-ion batteries requires special considerations and fees. RAID controller cards temporarily cache data from the host system until it is successfully written to the storage media.

While cached, data can be lost if system power fails, jeopardising the data's permanent integrity. RAID caching is cost-effective way to improve I/O peformance by writing data to a controller's cache before it is written to disk.

However, in the event of a power of server failure, the writes in cache may be lost.

CacheVault flash cache protection modules and battery backup units (BBUs) protect the integrity of cached data by storing cached data in non-volatile flash cache storage or by providing battery power to the controller.

CacheVault technology prevents data loss by powering critical components of the card long enough to automatically transfer the cached data to NAND flash.

Once power returns, the data is restored to the cache and normal operation resumes. By using a super-capacitor instead of Lithium-ION batteries, CacheVault technology virtually eliminates hardware maintenance costs associated with batteries, lowers total cost of ownership over the life of the controller card and provide more environmentally friendly cache protection, all while maintaining optimal RAID performance.

Oracle Exadata Database Machine X6 has no batteries. Newer machines have CVPM02 (Cache Vault), which is a super cap and not a battery.