[colug-432] I/O Error?

Bill Schwanitz bilsch at bilsch.org
Thu Dec 6 11:15:39 EST 2012


On Dec 6, 2012, at 10:26 AM, Thomas W. cranston <thomas.w.cranston at gmail.com> wrote:

> I don't Know. Perhaps I do not know how to interpret the data.
> 
> Looking at the man right now.
> 
> What are you getting at?
> 
> Tom

Tom,

If you want email the output of smartctl -a <dev> to the list.

I put *** in front of the ones you really want to look at. Note, not all drives have the same attributes.

I will also echo the comment from Rob Funk - if you have important stuff on that drive I'd make a backup of that data/get it off of that disk while you still can. Look towards the end of this email to see a really dead drive logs for comparison.

╰─○ sudo smartctl -a /dev/sdb                                                                                                   <<<
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-71.29.1.el6.x86_64] (local build)
(snip)

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   060   052   006    Pre-fail  Always       -       4891588
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       4758
*** 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       103678017425
  9 Power_On_Hours          0x0032   050   050   000    Old_age   Always       -       44500
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       445
194 Temperature_Celsius     0x0022   035   058   000    Old_age   Always       -       35 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   060   051   000    Old_age   Always       -       4891588
*** 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
*** 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
*** 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
*** 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     42335         -
# 2  Extended offline    Completed without error       00%     30238         -
# 3  Extended offline    Completed without error       00%     21211         -
# 4  Extended offline    Completed without error       00%     19939         -

-- really dead drives --

( note, these are different hosts/drives, examples only but real )

smartctl output

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
*** 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2218
2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 8480
3 Spin_Up_Time 0x0023 071 070 025 Pre-fail Always - 8984
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8360
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 24
13 Read_Soft_Error_Rate 0x003a 100 100 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 24 (Lifetime Min/Max 16/33)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
*** 197 Current_Pending_Sector 0x0032 002 002 000 Old_age Always - 8143
*** 198 Offline_Uncorrectable 0x0030 252 093 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
240 Head_Flying_Hours 0x0032 100 100 000 Old_age Always - 8360
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 16766
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1913
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 1

$ dmesg

sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
end_request: I/O error, dev sde, sector 1853481872
sd 4:0:0:0: [sde] Unhandled error code
sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
end_request: I/O error, dev sde, sector 1853481872
sd 4:0:0:0: [sde] Unhandled error code
sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 98 00 00 08 00
end_request: I/O error, dev sde, sector 1853481880
sd 4:0:0:0: [sde] Unhandled error code

email from smarts

The following warning/error was logged by the smartd daemon:

Device: /dev/sdc [SAT], FAILED SMART self-check. BACK UP DATA NOW!

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.





More information about the colug-432 mailing list