Tuesday, July 10, 2018

M2.devices in X7 and how to monitor and replace faulty disks

Oracle Exadata Database Machine X7 systems comes with two internal M.2 devices that contain the system area. In all previous systems, the first two disks of the Oracle Exadata Storage Server are system disks and the portions on these system disks are referred to as the system area.



Note:
Oracle Exadata Rack and Oracle Exadata Storage Servers can remain online and available while replacing an M.2 disk.

This section contains the following topics:

Monitoring the Status of M.2 Disks

You can monitor the status of a M.2 disk by checking its attributes with the CellCLI LIST PHYSICALDISK command.
The disk firmware maintains the error counters, and marks a drive with Predictive Failure when the disk is about to fail. The drive, not the cell software, determines if it needs replacement.
  • Use the CellCLI command LIST PHSYICALDISK to determine the status of a M.2 disk:
 
CellCLI> LIST PHYSICALDISK WHERE disktype='M2Disk' DETAIL
         name:                           M2_SYS_0
        deviceName:                  /dev/sdm
        diskType:                      M2Disk
         makeModel:                    "INTEL SSDSCGJK150G7"
         physicalFirmware:         N2010112
         physicalInsertTime:      2017-07-14T08:42:24-07:00
         physicalSerial:            PHDW7082000M150A
         physicalSize:               139.73558807373047G
         slotNumber:                  "M.2 Slot: 0"
         status:                failed

         name:                  M2_SYS_1        
         deviceName:            /dev/sdn
         diskType:              M2Disk
         makeModel:             "INTEL SSDSCKJB150G7"
         physicalFirmware:      N2010112
         physicalInsertTime:    2017-07-14T12:25:05-07:00
         physicalSerial:        PHDW708204SZ150A
         physicalSize:          139.73558807373047G
         slotNumber:            "M.2 Slot: 1"
         status:                normal


Replacing a M.2 Disk Due to Failure or Other Problems

Failure of a M.2 disks reduces redundancy of the system area, and can impact patching, imaging, and system rescue. Therefore, the disk should be replaced with a new disk as soon as possible. When a M.2 disk fails, the storage server automatically and transparently switches to using the software stored on the inactive system disk, making it the active system disk.



An Exadata alert is generated when an M.2 disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address. M.2 disk is hot-pluggable and can be replaced when the power is on. After the M.2 disk is replaced, Oracle Exadata System Software automatically adds the new device to the system partition and starts the rebuilding process.
 
  1. Identify the failed M.2 disk.
    CellCLI> LIST PHYSICALDISK WHERE diskType=M2Disk AND status!=normal DETAIL
             name:                      M2_SYS_0
              deviceName:              /dev/sda
              diskType:                M2Disk
              makeModel:              "INTEL SSDSCKJB150G7"
             physicalFirmware:          N2010112
             physicalInsertTime:        2017-07-14T08:42:24-07:00
             physicalSerial:            PHDW7082000M150A
             physicalSize:              139.73558807373047G
             slotNumber:                "M.2 Slot: 0"
           status:                    failed - dropped for replacement
    
  2. Locate the cell that has the white LED lit.
  3. Open the chassis and identify the M.2 disk by the slot number in Step 1.
  4. The amber LED for this disk should be lit to indicate service is needed.
    M.2 disks are hot pluggable, so you do not need to power down the cell before replacing the disk.
  5. Remove the M.2 disk:
    1. Rotate both riser board socket ejectors up and outward as far as they will go.
      The green power LED on the riser board turns off when you open the socket ejectors.
    2. Carefully lift the riser board straight up to the remove it from the sockets.
  6. Insert the replacement M.2 disk:
    1. Unpack the replacement flash riser board and place it on an antistatic mat.
    2. Align the notch in the replacement riser board with the connector key in the connector socket.
    3. Push the riser board into the connector socket until the riser board is securely seated in the socket.
      Caution:
      If the riser board does not easily seat into the connector socket, verify that the notch in the riser board is aligned with the connector key in the connector socket. If the notch is not aligned, damage to the riser board might occur.
    4. Rotate both riser board socket ejectors inward until the ejector tabs lock the riser board in place.
      The green power LED on the riser board turns on when you close the socket ejectors.
  7. Confirm the M.2 disk has been replaced.
    CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=M2Disk DETAIL
         name:                  M2_SYS_0 
        deviceName:            /dev/sdm   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"   
       physicalFirmware:      N2010112    
       physicalInsertTime:    2017-08-24T18:55:13-07:00   
       physicalSerial:        PHDW708201G0150A   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 0"   
       status:                normal   
    
       name:                  M2_SYS_1   
       deviceName:            /dev/sdn   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"    
       physicalFirmware:      N2010112   
       physicalInsertTime:    2017-08-24T18:55:13-07:00   
       physicalSerial:        PHDW708200SZ150A   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 1"   
       status:                normal 
    
  8. Confirm the system disk arrays are have an active sync status, or are being rebuilt.
# mdadm --detail /dev/md[2-3][4-5]
/dev/md24:
      Container : /dev/md/imsm0, member 0
     Raid Level : raid1
     Array Size : 104857600 (100.00 GiB 107.37 GB)
  Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
   Raid Devices : 2
  Total Devices : 2

               State  : active
 Active Devices  : 2
Working Devices  : 2
 Failed Devices  : 0
   Spare Devices : 0  

            UUUID : 152f728a:6d294098:5177b2e5:8e0d8c6c
   Number    Major    Minor    RaidDevice    State
    1           8         16             0       active sync  /dev/sdb
    0           8           0            1       active sync  /dev/sda
/dev/md25:
      Container : /dev/md/imsm0, member 1
     Raid Level : raid1
     Array Size : 41660426 (39.73 GiB 42.66 GB)
  Used Dev Size : 41660524 (39.73 GiB 42.66 GB)
   Raid Devices : 2
  Total Devices : 2

               State  : clean
 Active Devices  : 2
Working Devices  : 2
 Failed Devices  : 0
   Spare Devices : 0  

             UUID : 466173ba:507008c7:6d65ed89:3c40cf23
   Number    Major    Minor    RaidDevice    State
 1           8         16        0      active sync  /dev/sdb
 0           8         0         1      active sync  /dev/sda

1 comment: