[colug-432] Interpret smartctl
Bill Schwanitz
bilsch at bilsch.org
Tue Jan 29 13:37:51 EST 2013
On Jan 28, 2013, at 7:37 PM, Thomas W. cranston <thomas.w.cranston at gmail.com> wrote:
> looked at dmesg last night, finding no error messages.
>
> I next ran smartctl, but need help deciphering the readings.
>
> For example:
>
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED
>
> RAW_VALUE
>
> Can somebody explain the significance of the values under these headers?
>
> I See nothing under WHEN_FAILED. That means nothing wrong with the HDD?
> No bad sectors?
These are the lines you should probably look at ( from your email just copy/pasted ):
1 Raw_Read_Error_Rate 0x000f 100 095 046 Pre-fail Always - 151324
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 573972
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 20244550189059
198 Offline_Uncorrectable 0x0032 100 100 000 Old_age Always - 126979380084739
The currently pending and offline/uncorrectable value seems rather high to me - not sure I trust it.
I put *** wrapping the important part below.
Error 113 occurred at disk power-on lifetime: 9989 hours (416 days + 5
hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e e0 Error: ***UNC 8 sectors at LBA = 0x003e91bf = 4100543***
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 40 87 91 3e e0 00 01:27:16.122 READ DMA EXT
25 00 20 47 8f 3e e0 00 01:27:16.114 READ DMA EXT
25 00 20 7f 8b 3e e0 00 01:27:16.107 READ DMA EXT
25 00 20 e7 89 3e e0 00 01:27:16.105 READ DMA EXT
25 00 08 7f 89 3e e0 00 01:27:16.104 READ DMA EXT
So, what that means is that the drive returned a failure while trying to read lba address 0x003e91bf ( a physical sector of the drive )
These are your early warnings that the drive is going south. What I would recommend you do is this:
sudo smartctl -a long /dev/sda
The response should be something like this:
$ sudo smartctl -t long /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 48 minutes for test to complete.
Test will complete after Tue Jan 29 14:18:05 2013
Use smartctl -X to abort test.
So, my drive will take 48 minutes to run - yours likely will be longer ( this is off a 60gb ssd drive… ) Note, you need to leave the system running during this period so don't suspend or power it off.
When the time has passed, run smartctl -a /dev/sda and see what it shows:
( Note, different drive and all clear/ok )
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 27542 -
# 2 Extended offline Completed without error 00% 27373 -
If you see anything in the column for LBA_of_first_error, that indicates your drive has a failed sector.
If you are doing this against a fresh drive it is possible you simply got a bad drive - it happens. Also note that if you want to RMA the drive or somethign its highly likely all of the above will likely need to be repeated utilizing the vendors tool of choice ( and is typically windows or dos based in my experience )
I also highly recommend you run smartd which is a daemonized monitor. man smartd.conf to look over the options. Set the smartd daemon to run at boot. You should have smartd configured to run full tests of your drives at least monthly - more aggressivly if you want.
( note, smartd will send an email when it finds errors which is why I like/recommend it )
hope this helps!
Bill
here are the relevent lines I have in my fileserver box at home
/dev/sda -d ata -s L/../../3/18
/dev/sdb -d ata -s L/../../3/18
/dev/sdc -d ata -s L/../../3/18
/dev/sdd -d ata -s L/../../3/18
/dev/sde -d ata -s L/../../3/18
read the man page to see what that means ;)
More information about the colug-432
mailing list