[colug-432] md (linux drive mirroring)
R P Herrold
herrold at owlriver.com
Wed Aug 21 15:10:27 EDT 2019
On Wed, 21 Aug 2019, Jeff Frontz wrote:
> Does anyone have any success stories from using md? How were you able to
> determine that it actually allowed the system to successfully ride out a
> hard failure -- are there log messages or anything that gave some
> indication?
We have used it in commercial production for ... since before
RHEL / CentOS 5 days,
so: /me looks:
at least back to 2006
and find it indespensible Much easier to manage than
'hardware' assisted, and worse: 'fakeraid' RAID solutions
~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf1[14] sdq1[4] sdm1[7] sdp1[2] sdo1[0]
sdn1[1] sdl1[13] sdk1[3] sdj1[15](F) sdi1[6] sdh1[5] sdg1[11]
sde1[12] sdc1[10] sdb1[9] sda1[8]
25395655168 blocks level 6, 64k chunk, algorithm 2
[15/15] [UUUUUUUUUUUUUUU]
---------------
When a drive 'falls out' of the array, the 'U' changes to '-'
or such. I have a 'cron' process which watches for such, and
provides notification
smartctl works fine, and will, depending on how you set the
config file, drop notifications into the syslog, and are so
'detectible' by 'logwatch' and such. Belt and suspenders
~]# grep -v "^#" /etc/smartd.conf | grep -v "^$"
/dev/sda -a -o on -S on -s (S/../.././10|L/../../6/08) -m root
/dev/sdb -a -o on -S on -s (S/../.././11|L/../../6/09) -m root
/dev/sdc -a -o on -S on -s (S/../.././12|L/../../6/10) -m root
/dev/sde -a -o on -S on -s (S/../.././13|L/../../6/11) -m root
/dev/sdf -a -o on -S on -s (S/../.././14|L/../../6/12) -m root
/dev/sdg -a -o on -S on -s (S/../.././15|L/../../6/13) -m root
/dev/sdh -a -o on -S on -s (S/../.././16|L/../../6/14) -m root
/dev/sdi -a -o on -S on -s (S/../.././17|L/../../6/15) -m root
/dev/sdk -a -o on -S on -s (S/../.././19|L/../../6/17) -m root
/dev/sdl -a -o on -S on -s (S/../.././20|L/../../6/18) -m root
/dev/sdm -a -o on -S on -s (S/../.././21|L/../../6/19) -m root
/dev/sdn -a -o on -S on -s (S/../.././22|L/../../6/20) -m root
/dev/sdo -a -o on -S on -s (S/../.././23|L/../../6/21) -m root
/dev/sdp -a -o on -S on -s (S/../.././00|L/../../6/22) -m root
/dev/sdq -a -o on -S on -s (S/../.././18|L/../../6/16) -m root
There has not been any data loss, despite cycling at least 6
replacements drives through that chassis
-----
from the 'dmesg' on a recent failure:
end_request: I/O error, dev sdf, sector 2563365089
sd 0:0:5:0: SCSI error: return code = 0x08000002
sdf: Current: sense key: Medium Error
Add. Sense: Unrecovered read error
end_request: I/O error, dev sdf, sector 2563365033
raid5:md0: read error NOT corrected!! (sector 2563365032 on
sdf1).
raid5: Disk failure on sdf1, disabling device. Operation
continuing on 14 devices
raid5:md0: read error not correctable (sector 2563365040 on
sdf1).
raid5:md0: read error not correctable (sector 2563365048 on
sdf1).
( as you can see the device failed, and was 'disabled' and
dropped out of the array )
a new drive was put in place, hot, and seen by the system
EDAC k8 MC1: general bus error: participating processor(local
node origin), time-out(no timeout) memory transaction
type(generic read), mem or i/o(mem access), cache
level(generic)
EDAC MC1: CE page 0x149dc4, offset 0x5d8, grain 8, syndrome
0xf1, row 2, channel 1, label "": k8_edac
EDAC k8 MC1: extended error code: ECC error
md: unbind<sdf1>
md: export_rdev(sdf1)
mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00) cb_idx mptbase_reply
mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO
Device Missing Delay Retry}, SubCode(0x0000) cb_idx
mptbase_reply
end_device-0:5: mptsas: ioc0: removing sata device:
fw_channel 0, fw_id 7, phy 6,sas_addr 0x1221000006000000
phy-0:6: mptsas: ioc0: delete phy 6, phy-obj
(0xffff81007fe5d800)
port-0:5: mptsas: ioc0: delete port 5, sas_addr
(0x1221000006000000)
target0:0:5: mptsas: ioc0: delete device: fw_channel 0, fw_id
7, phy 6, sas_addr 0x1221000006000000
mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 7,
phy 6, sas_addr 0x1221000006000000
Vendor: ATA Model: ST2000VN000-1HJ1 Rev: SC60
Type: Direct-Access ANSI SCSI
revision: 05
SCSI device sdf: 3907029168 512-byte hdwr sectors (2000399 MB)
sdf: Write Protect is off
sdf: Mode Sense: 73 00 00 08
SCSI device sdf: drive cache: write through
SCSI device sdf: 3907029168 512-byte hdwr sectors (2000399 MB)
sdf: Write Protect is off
and the array rebuilt:
disk 14, o:1, dev:sdf1
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 60000
KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more
than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1953511936 blocks.
md: md0: sync done.
RAID5 conf printout:
--- rd:15 wd:15 fd:0
Those rebuilds can take a while but * shrug * they are rare
On the QNAP appliances (several) and a Synology one, locally,
'md' raid is also used, and I personally have had no data
loss due to media failure for years and years
>From the log for the Synology (the QNAPs use 3.5 drives, which
seem not to fail as quickly as 2.5'ers)
A. 2018 12 19
repl 4 w/
HGST mdl HTS721010A9E630
931.5 G
sn JR1020BNGWMDJE
disk 1 2 and 3 are OK -- 4 is not
B. 2018 04
repl 3 w/
Samsung ST1500LM006 sn S34QJ9CG106486 fw: 2BC10007
bot AMZN 2018 04 DOM: 1/2015 Momentus-D brand
AMZN order: Order # 114-7251086-6597039
Order placed January 8, 2018
vendor: Hard Drive Geeks
disk 1 2 and 4 are OK -- 3 is not
C. repl 3 w/
Seagate ST1000LM014 sn W771JQ8F
bot AMZN 2018 01 Reconditioned 04 May 2017
verify.seagate.com
D. disk 1 3 and 4 are OK -- 2 is not
ST1000LM014-1EJ164
ok are SN W381A9LJ
W381AEGC
W381BEDC
all at firmware: SM14
E. ordered repl 2018 01 08 AMZN
repl needs to be added to the array
Repl SN is: W771JQ8F
at f/w: SM361
that is, five disks replaced on a four drive chassis, over the
years ... each chassis position has had failures
QNAP 'nas7' was recently deployed, and setting up the file
ssytems, and scrubbing the drives is still in the dmesg's
4
[ 12.877996] device-mapper: ioctl: 4.33.0-ioctl (2015-8-18)
initialised: dm-devel at redhat.com
[ 12.878141] device-mapper: multipath: version 1.9.0 loaded
[ 12.878145] device-mapper: multipath round-robin: version
1.0.0 loaded
[ 12.878147] device-mapper: multipath queue-length: version
0.1.0 loaded
[ 12.878149] device-mapper: multipath service-time: version
0.2.0 loaded
[ 12.878189] usbcore: registered new interface driver btusb
...
.
[ 13.029861] scsi 2:0:0:0: Direct-Access ATA
CT120BX500SSD1 M6C PQ: 0 ANSI: 5
[ 13.029865] ata3.00: set queue depth = 31
[ 13.029999] Check proc_name[ahci].
[ 13.030005] Check proc_name[ahci].
[ 13.030005] Check proc_name[ahci].
[ 13.030037] Check proc_name[ahci].
[ 13.030057] Check proc_name[ahci].
[ 13.030178] sd 2:0:0:0: [sdc] 234441648 512-byte logical
blocks: (120 GB/111 GiB)
[ 13.030260] sd 2:0:0:0: [sdc] Write Protect is off
[ 13.030263] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 13.030288] sd 2:0:0:0: [sdc] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
[ 13.030584] sd 2:0:0:0: Attached scsi generic sg2 type 0
[ 13.149166] sdc: sdc1 sdc2 sdc3 sdc4 sdc5
[ 13.149678] sd 2:0:0:0: [sdc] Attached SCSI disk
...
m
[ 13.345217] ata4: SATA link up 6.0 Gbps (SStatus 133
SControl 330)
[ 13.355362] ata4.00: ATA-9: CT120BX500SSD1, M6CR013, max
UDMA/133
[ 13.361501] ata4.00: 234441648 sectors, multi 1: LBA48 NCQ
(depth 31/32), AA
[ 13.377369] ata4.00: configured for UDMA/133
...
.
[ 13.533981] mmcblk0: p1 p2 p3 p5 p6
[ 13.687338] ata4: SATA link up 6.0 Gbps (SStatus 133
SControl 330)
[ 13.707088] ata4.00: configured for UDMA/133
[ 13.712242] Check proc_name[ahci].
[ 13.715632] scsi 3:0:0:0: Direct-Access ATA
CT120BX500SSD1 M6C PQ: 0 ANSI: 5
...
.
[ 13.738030] sd 3:0:0:0: [sdd] 234441648 512-byte logical
blocks: (120 GB/111 GiB)
[ 13.738165] sd 3:0:0:0: [sdd] Write Protect is off
[ 13.738168] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 13.738208] sd 3:0:0:0: [sdd] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
[ 13.739630] sdd: sdd1 sdd2 sdd3 sdd4 sdd5
[ 13.740195] sd 3:0:0:0: [sdd] Attached SCSI disk
5
[ 45.108480] disk 1, wo:1, o:1, dev:sdd5
[ 45.108544] md: recovery of RAID array md321
[ 45.112814] md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
[ 45.118601] md: using maximum available idle IO bandwidth
(but not more than 500000 KB/sec) for recovery.
[ 45.128108] md: Recovering started: md321
[ 45.132102] md/raid:md321: report qnap hal event: type =
HAL_EVENT_RAID, action = REBUILDING_START
[ 45.141038] md/raid:md321: report qnap hal event:
raid_id=321, pd_name=/dev/(null), spare=/dev/(null),
pd_repair_sector=0
[ 45.151911] md: using 128k window, over a total of
8283712k.
[ 45.157540] md: resuming recovery of md321 from checkpoint.
[ 46.297731] md: md1 stopped.
[ 46.309379] md: bind<sdd3>
[ 46.312294] md: bind<sda3>
[ 46.315170] md: bind<sdb3>
[ 46.318029] md: bind<sdc3>
[ 46.322936] md/raid:md1: device sdc3 operational as raid
disk 0
[ 46.328843] md/raid:md1: device sdb3 operational as raid
disk 3
[ 46.334725] md/raid:md1: device sda3 operational as raid
disk 2
[ 46.340635] md/raid:md1: device sdd3 operational as raid
disk 1
[ 46.352804] md/raid:md1: allocated 70240kB
[ 46.356944] md/raid:md1: raid level 5 active with 4 out of
4 devices, algorithm 2
[ 46.364385] RAID conf printout:
[ 46.364386] --- level:5 rd:4 wd:4
[ 46.364389] disk 0, o:1, dev:sdc3
[ 46.364391] disk 1, o:1, dev:sdd3
[ 46.364393] disk 2, o:1, dev:sda3
[ 46.364395] disk 3, o:1, dev:sdb3
[ 46.364424] md/raid:md1: /dev/sdc3 does not support SSD
DZAT(Deterministic Read Zero after TRIM).
[ 46.373229] md/raid:md1: /dev/sdb3 does not support SSD
DZAT(Deterministic Read Zero after TRIM).
[ 46.382027] md/raid:md1: /dev/sda3 does not support SSD
DZAT(Deterministic Read Zero after TRIM).
[ 46.390824] md/raid:md1: /dev/sdd3 does not support SSD
DZAT(Deterministic Read Zero after TRIM).
[ 46.399683] md1: detected capacity change from 0 to
313043976192
...
= RESYNCING_START
[454555.048224] md/raid:md1: report qnap hal event: raid_id=1,
pd_name=/dev/(null), spare=/dev/(null), pd_repair_sector=0
[454555.058857] md: using 2048k window, over a total of
101902336k.
[455174.425473] md: md1: requested-resync done.
[455174.429736] md: Resyncing done: md1
[455174.433302] md/raid:md1: report qnap hal event: type =
HAL_EVENT_RAID, action = RESYNCING_COMPLETE
[455174.442301] md/raid:md1: report qnap hal event: raid_id=1,
pd_name=/dev/(null), spare=/dev/(null), pd_repair_sector=0
[455174.454833] md: qnap_md_badblock_final_check:
(md:md1,dev:sdc3): id= 0, flags = 0x2, bb count = 0
[455174.463757] md: qnap_md_badblock_final_check:
(md:md1,dev:sdb3): id= 3, flags = 0x2, bb count = 0
[455174.472657] md: qnap_md_badblock_final_check:
(md:md1,dev:sda3): id= 2, flags = 0x2, bb count = 0
[455174.481547] md: qnap_md_badblock_final_check:
(md:md1,dev:sdd3): id= 1, flags = 0x2, bb count = 0
[458156.403805] flush_memory.sh (13795): drop_caches: 3
both QNAP and Synology have web interfaces for keying off this
maintenance of removing a sick drive, slotting a new one in,
adding to the LVM, then adding it to the 'md' pools (several
with these appliances), and once coherent images, scrubbing
out the slack space
Our work a decade ago was done at the CLI of course
-- Russ herrold
More information about the colug-432
mailing list