[colug-432] md (linux drive mirroring)

Wed Aug 21 15:10:27 EDT 2019

On Wed, 21 Aug 2019, Jeff Frontz wrote:

> Does anyone have any success stories from using md?   How were you able to
> determine that it actually allowed the system to successfully ride out a
> hard failure -- are there log messages or anything that gave some
> indication?

We have used it in commercial production for ... since before 
	RHEL / CentOS 5 days, 

so: /me looks:
	at least back to 2006

and find it indespensible  Much easier to manage than 
'hardware' assisted, and worse: 'fakeraid' RAID solutions

~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf1[14] sdq1[4] sdm1[7] sdp1[2] sdo1[0] 
sdn1[1] sdl1[13] sdk1[3] sdj1[15](F) sdi1[6] sdh1[5] sdg1[11] 
sde1[12] sdc1[10] sdb1[9] sda1[8]
      25395655168 blocks level 6, 64k chunk, algorithm 2 
[15/15] [UUUUUUUUUUUUUUU]

---------------
When a drive 'falls out' of the array, the 'U' changes to '-' 
or such.  I have a 'cron' process which watches for such, and 
provides notification

smartctl works fine, and will, depending on how you set the 
config file, drop notifications into the syslog, and are so 
'detectible' by 'logwatch' and such.  Belt and suspenders

~]# grep -v "^#" /etc/smartd.conf  | grep -v "^$"
/dev/sda -a -o on -S on -s (S/../.././10|L/../../6/08) -m root
/dev/sdb -a -o on -S on -s (S/../.././11|L/../../6/09) -m root
/dev/sdc -a -o on -S on -s (S/../.././12|L/../../6/10) -m root
/dev/sde -a -o on -S on -s (S/../.././13|L/../../6/11) -m root
/dev/sdf -a -o on -S on -s (S/../.././14|L/../../6/12) -m root
/dev/sdg -a -o on -S on -s (S/../.././15|L/../../6/13) -m root
/dev/sdh -a -o on -S on -s (S/../.././16|L/../../6/14) -m root
/dev/sdi -a -o on -S on -s (S/../.././17|L/../../6/15) -m root
/dev/sdk -a -o on -S on -s (S/../.././19|L/../../6/17) -m root
/dev/sdl -a -o on -S on -s (S/../.././20|L/../../6/18) -m root
/dev/sdm -a -o on -S on -s (S/../.././21|L/../../6/19) -m root
/dev/sdn -a -o on -S on -s (S/../.././22|L/../../6/20) -m root
/dev/sdo -a -o on -S on -s (S/../.././23|L/../../6/21) -m root
/dev/sdp -a -o on -S on -s (S/../.././00|L/../../6/22) -m root
/dev/sdq -a -o on -S on -s (S/../.././18|L/../../6/16) -m root

There has not been any data loss, despite cycling at least 6 
replacements drives through that chassis

-----

from the 'dmesg' on a recent failure:

end_request: I/O error, dev sdf, sector 2563365089
sd 0:0:5:0: SCSI error: return code = 0x08000002
sdf: Current: sense key: Medium Error
    Add. Sense: Unrecovered read error

end_request: I/O error, dev sdf, sector 2563365033
raid5:md0: read error NOT corrected!! (sector 2563365032 on 
sdf1).
raid5: Disk failure on sdf1, disabling device. Operation 
continuing on 14 devices
raid5:md0: read error not correctable (sector 2563365040 on 
sdf1).
raid5:md0: read error not correctable (sector 2563365048 on 
sdf1).

( as you can see the device failed, and was 'disabled' and 
dropped out of the array )

a new drive was put in place, hot, and seen by the system

EDAC k8 MC1: general bus error: participating processor(local 
node origin), time-out(no timeout) memory transaction 
type(generic read), mem or i/o(mem access), cache 
level(generic)
EDAC MC1: CE page 0x149dc4, offset 0x5d8, grain 8, syndrome 
0xf1, row 2, channel 1, label "": k8_edac
EDAC k8 MC1: extended error code: ECC error
md: unbind<sdf1>
md: export_rdev(sdf1)
mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, 
Code={Reset}, SubCode(0x0d00) cb_idx mptbase_reply
mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO 
Device Missing Delay Retry}, SubCode(0x0000) cb_idx 
mptbase_reply
 end_device-0:5: mptsas: ioc0: removing sata device: 
fw_channel 0, fw_id 7, phy 6,sas_addr 0x1221000006000000
 phy-0:6: mptsas: ioc0: delete phy 6, phy-obj 
(0xffff81007fe5d800)
 port-0:5: mptsas: ioc0: delete port 5, sas_addr 
(0x1221000006000000)
 target0:0:5: mptsas: ioc0: delete device: fw_channel 0, fw_id 
7, phy 6, sas_addr 0x1221000006000000
mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 7, 
phy 6, sas_addr 0x1221000006000000
  Vendor: ATA       Model: ST2000VN000-1HJ1  Rev: SC60
  Type:   Direct-Access                      ANSI SCSI 
revision: 05
SCSI device sdf: 3907029168 512-byte hdwr sectors (2000399 MB)
sdf: Write Protect is off
sdf: Mode Sense: 73 00 00 08
SCSI device sdf: drive cache: write through
SCSI device sdf: 3907029168 512-byte hdwr sectors (2000399 MB)
sdf: Write Protect is off

and the array rebuilt:

 disk 14, o:1, dev:sdf1
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 60000 
KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more 
than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1953511936 blocks.
md: md0: sync done.
RAID5 conf printout:
 --- rd:15 wd:15 fd:0

Those rebuilds can take a while but * shrug *  they are rare

On the QNAP appliances (several) and a Synology one, locally, 
'md' raid is also used, and I personally have had no data 
loss due to media failure for years and years

>From the log for the Synology (the QNAPs use 3.5 drives, which 
seem not to fail as quickly as 2.5'ers)

A. 2018 12 19
repl 4 w/
HGST mdl HTS721010A9E630
931.5 G
sn JR1020BNGWMDJE
disk 1 2 and 3 are OK -- 4 is not

B. 2018 04
repl 3 w/
Samsung ST1500LM006 sn S34QJ9CG106486 fw: 2BC10007
bot AMZN 2018 04 DOM: 1/2015  Momentus-D brand
AMZN order: Order # 114-7251086-6597039
Order placed  January 8, 2018
vendor: Hard Drive Geeks
disk 1 2 and 4 are OK -- 3 is not

C. repl 3 w/
Seagate ST1000LM014 sn W771JQ8F
bot AMZN 2018 01 Reconditioned 04 May 2017
verify.seagate.com

D. disk 1 3 and 4 are OK -- 2 is not
ST1000LM014-1EJ164
        ok are SN W381A9LJ
                  W381AEGC
                  W381BEDC
all at firmware: SM14

E. ordered repl 2018 01 08 AMZN
repl needs to be added to the array
        Repl SN is: W771JQ8F
        at f/w: SM361

that is, five disks replaced on a four drive chassis, over the 
years ... each chassis position has had failures

QNAP 'nas7' was recently deployed, and setting up the file 
ssytems, and scrubbing the drives is still in the dmesg's

4
[   12.877996] device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) 
initialised: dm-devel at redhat.com
[   12.878141] device-mapper: multipath: version 1.9.0 loaded
[   12.878145] device-mapper: multipath round-robin: version 
1.0.0 loaded
[   12.878147] device-mapper: multipath queue-length: version 
0.1.0 loaded
[   12.878149] device-mapper: multipath service-time: version 
0.2.0 loaded
[   12.878189] usbcore: registered new interface driver btusb
... 
.
[   13.029861] scsi 2:0:0:0: Direct-Access     ATA      
CT120BX500SSD1    M6C PQ: 0 ANSI: 5
[   13.029865] ata3.00: set queue depth = 31
[   13.029999] Check proc_name[ahci].
[   13.030005] Check proc_name[ahci].
[   13.030005] Check proc_name[ahci].
[   13.030037] Check proc_name[ahci].
[   13.030057] Check proc_name[ahci].
[   13.030178] sd 2:0:0:0: [sdc] 234441648 512-byte logical 
blocks: (120 GB/111 GiB)
[   13.030260] sd 2:0:0:0: [sdc] Write Protect is off
[   13.030263] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   13.030288] sd 2:0:0:0: [sdc] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
[   13.030584] sd 2:0:0:0: Attached scsi generic sg2 type 0
[   13.149166]  sdc: sdc1 sdc2 sdc3 sdc4 sdc5
[   13.149678] sd 2:0:0:0: [sdc] Attached SCSI disk

...
m
[   13.345217] ata4: SATA link up 6.0 Gbps (SStatus 133 
SControl 330)
[   13.355362] ata4.00: ATA-9: CT120BX500SSD1,  M6CR013, max 
UDMA/133
[   13.361501] ata4.00: 234441648 sectors, multi 1: LBA48 NCQ 
(depth 31/32), AA
[   13.377369] ata4.00: configured for UDMA/133
...
.
[   13.533981]  mmcblk0: p1 p2 p3 p5 p6
[   13.687338] ata4: SATA link up 6.0 Gbps (SStatus 133 
SControl 330)
[   13.707088] ata4.00: configured for UDMA/133
[   13.712242] Check proc_name[ahci].
[   13.715632] scsi 3:0:0:0: Direct-Access     ATA      
CT120BX500SSD1    M6C PQ: 0 ANSI: 5
...
.
[   13.738030] sd 3:0:0:0: [sdd] 234441648 512-byte logical 
blocks: (120 GB/111 GiB)
[   13.738165] sd 3:0:0:0: [sdd] Write Protect is off
[   13.738168] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[   13.738208] sd 3:0:0:0: [sdd] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
[   13.739630]  sdd: sdd1 sdd2 sdd3 sdd4 sdd5
[   13.740195] sd 3:0:0:0: [sdd] Attached SCSI disk

5
[   45.108480]  disk 1, wo:1, o:1, dev:sdd5
[   45.108544] md: recovery of RAID array md321
[   45.112814] md: minimum _guaranteed_  speed: 1000 
KB/sec/disk.
[   45.118601] md: using maximum available idle IO bandwidth 
(but not more than 500000 KB/sec) for recovery.
[   45.128108] md: Recovering started: md321
[   45.132102] md/raid:md321: report qnap hal event: type = 
HAL_EVENT_RAID, action = REBUILDING_START
[   45.141038] md/raid:md321: report qnap hal event: 
raid_id=321, pd_name=/dev/(null), spare=/dev/(null), 
pd_repair_sector=0
[   45.151911] md: using 128k window, over a total of 
8283712k.
[   45.157540] md: resuming recovery of md321 from checkpoint.
[   46.297731] md: md1 stopped.
[   46.309379] md: bind<sdd3>
[   46.312294] md: bind<sda3>
[   46.315170] md: bind<sdb3>
[   46.318029] md: bind<sdc3>
[   46.322936] md/raid:md1: device sdc3 operational as raid 
disk 0
[   46.328843] md/raid:md1: device sdb3 operational as raid 
disk 3
[   46.334725] md/raid:md1: device sda3 operational as raid 
disk 2
[   46.340635] md/raid:md1: device sdd3 operational as raid 
disk 1
[   46.352804] md/raid:md1: allocated 70240kB
[   46.356944] md/raid:md1: raid level 5 active with 4 out of 
4 devices, algorithm 2
[   46.364385] RAID conf printout:
[   46.364386]  --- level:5 rd:4 wd:4
[   46.364389]  disk 0, o:1, dev:sdc3
[   46.364391]  disk 1, o:1, dev:sdd3
[   46.364393]  disk 2, o:1, dev:sda3
[   46.364395]  disk 3, o:1, dev:sdb3
[   46.364424] md/raid:md1: /dev/sdc3 does not support SSD 
DZAT(Deterministic Read Zero after TRIM).
[   46.373229] md/raid:md1: /dev/sdb3 does not support SSD 
DZAT(Deterministic Read Zero after TRIM).
[   46.382027] md/raid:md1: /dev/sda3 does not support SSD 
DZAT(Deterministic Read Zero after TRIM).
[   46.390824] md/raid:md1: /dev/sdd3 does not support SSD 
DZAT(Deterministic Read Zero after TRIM).
[   46.399683] md1: detected capacity change from 0 to 
313043976192

...

 = RESYNCING_START
[454555.048224] md/raid:md1: report qnap hal event: raid_id=1, 
pd_name=/dev/(null), spare=/dev/(null), pd_repair_sector=0
[454555.058857] md: using 2048k window, over a total of 
101902336k.
[455174.425473] md: md1: requested-resync done.
[455174.429736] md: Resyncing done: md1
[455174.433302] md/raid:md1: report qnap hal event: type = 
HAL_EVENT_RAID, action = RESYNCING_COMPLETE
[455174.442301] md/raid:md1: report qnap hal event: raid_id=1, 
pd_name=/dev/(null), spare=/dev/(null), pd_repair_sector=0
[455174.454833] md: qnap_md_badblock_final_check: 
(md:md1,dev:sdc3): id= 0, flags = 0x2, bb count = 0
[455174.463757] md: qnap_md_badblock_final_check: 
(md:md1,dev:sdb3): id= 3, flags = 0x2, bb count = 0
[455174.472657] md: qnap_md_badblock_final_check: 
(md:md1,dev:sda3): id= 2, flags = 0x2, bb count = 0
[455174.481547] md: qnap_md_badblock_final_check: 
(md:md1,dev:sdd3): id= 1, flags = 0x2, bb count = 0
[458156.403805] flush_memory.sh (13795): drop_caches: 3

both QNAP and Synology have web interfaces for keying off this 
maintenance of removing a sick drive, slotting a new one in, 
adding to the LVM, then adding it to the 'md' pools (several 
with these appliances), and once coherent images, scrubbing 
out the slack space

Our work a decade ago was done at the CLI of course

-- Russ herrold