[colug-432] I/O Error?

Thomas W. cranston thomas.w.cranston at gmail.com
Thu Dec 6 11:51:06 EST 2012


On 12/06/2012 10:15 AM, Bill Schwanitz wrote:
>
> On Dec 6, 2012, at 10:26 AM, Thomas W. cranston<thomas.w.cranston at gmail.com>  wrote:
>
>> I don't Know. Perhaps I do not know how to interpret the data.
>>
>> Looking at the man right now.
>>
>> What are you getting at?
>>
>> Tom
>
> Tom,
>
> If you want email the output of smartctl -a<dev>  to the list.
>
> I put *** in front of the ones you really want to look at. Note, not all drives have the same attributes.
>
> I will also echo the comment from Rob Funk - if you have important stuff on that drive I'd make a backup of that data/get it off of that disk while you still can. Look towards the end of this email to see a really dead drive logs for comparison.
>
> ╰─○ sudo smartctl -a /dev/sdb<<<
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-71.29.1.el6.x86_64] (local build)
> (snip)
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>    1 Raw_Read_Error_Rate     0x000f   060   052   006    Pre-fail  Always       -       4891588
>    3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
>    4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       4758
> *** 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
>    7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       103678017425
>    9 Power_On_Hours          0x0032   050   050   000    Old_age   Always       -       44500
>   10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
>   12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       445
> 194 Temperature_Celsius     0x0022   035   058   000    Old_age   Always       -       35 (0 14 0 0 0)
> 195 Hardware_ECC_Recovered  0x001a   060   051   000    Old_age   Always       -       4891588
> *** 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
> *** 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
> *** 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
> *** 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
> 202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed without error       00%     42335         -
> # 2  Extended offline    Completed without error       00%     30238         -
> # 3  Extended offline    Completed without error       00%     21211         -
> # 4  Extended offline    Completed without error       00%     19939         -
>
> -- really dead drives --
>
> ( note, these are different hosts/drives, examples only but real )
>
> smartctl output
>
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> *** 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2218
> 2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 8480
> 3 Spin_Up_Time 0x0023 071 070 025 Pre-fail Always - 8984
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
> 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
> 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8360
> 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
> 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 24
> 13 Read_Soft_Error_Rate 0x003a 100 100 000 Old_age Always - 0
> 191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
> 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
> 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
> 194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 24 (Lifetime Min/Max 16/33)
> 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
> 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
> *** 197 Current_Pending_Sector 0x0032 002 002 000 Old_age Always - 8143
> *** 198 Offline_Uncorrectable 0x0030 252 093 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
> 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
> 240 Head_Flying_Hours 0x0032 100 100 000 Old_age Always - 8360
> 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 16766
> 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1913
> 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 1
>
> $ dmesg
>
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481872
> sd 4:0:0:0: [sde] Unhandled error code
> sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481872
> sd 4:0:0:0: [sde] Unhandled error code
> sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 98 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481880
> sd 4:0:0:0: [sde] Unhandled error code
>
> email from smarts
>
> The following warning/error was logged by the smartd daemon:
>
> Device: /dev/sdc [SAT], FAILED SMART self-check. BACK UP DATA NOW!
>
> For details see host's SYSLOG (default: /var/log/messages).
>
> You can also use the smartctl utility for further investigation.
> No additional email messages about this problem will be sent.
>
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432

I have moved the data I want to another storage device. Shopping for new 
HDD now.

10:48:36 tom at 1520: ~$ sudo smartctl -a /dev/sda | more
[sudo] password for tom:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     FUJITSU MHY2120BH
Serial Number:    K430T832NAD0
Firmware Version: 0085000B
User Capacity:    120,034,123,776 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3f
Local Time is:    Thu Dec  6 10:48:49 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine 
completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		 ( 487) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  69) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   100   095   046    Pre-fail 
Always       -       182587
   3 Spin_Up_Time            0x0003   100   100   025    Pre-fail 
Always       -       1
   4 Start_Stop_Count        0x0032   098   098   000    Old_age 
Always       -       50417
   5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail 
Always       -       8589934592000
   9 Power_On_Hours          0x0032   081   081   000    Old_age 
Always       -       599649
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       4173
191 G-Sense_Error_Rate      0x0012   100   100   000    Old_age   Always 
       -       747
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always 
       -       154
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always 
       -       36 (Lifetime Min/Max 5/49)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always 
       -       562860
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always 
       -       20052602454019
198 Offline_Uncorrectable   0x0032   100   100   000    Old_age   Always 
       -       126270966595587
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always 
       -       15481508
200 Multi_Zone_Error_Rate   0x0032   100   100   000    Old_age   Always 
       -       45407574

SMART Error Log Version: 1
ATA Error Count: 113 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 113 occurred at disk power-on lifetime: 9989 hours (416 days + 5 
hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 08 bf 91 3e e0  Error: UNC 8 sectors at LBA = 0x003e91bf = 4100543

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   25 00 40 87 91 3e e0 00      01:27:16.122  READ DMA EXT
   25 00 20 47 8f 3e e0 00      01:27:16.114  READ DMA EXT
   25 00 20 7f 8b 3e e0 00      01:27:16.107  READ DMA EXT
   25 00 20 e7 89 3e e0 00      01:27:16.105  READ DMA EXT
   25 00 08 7f 89 3e e0 00      01:27:16.104  READ DMA EXT

Error 112 occurred at disk power-on lifetime: 9939 hours (414 days + 3 
hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 08 bf 91 3e ed  Error: UNC 8 sectors at LBA = 0x0d3e91bf = 
222204351

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 08 bf 91 3e ed 08      02:27:08.962  READ DMA
   ec 00 00 00 00 00 a0 08      02:27:08.953  IDENTIFY DEVICE
   ef 03 45 00 00 00 a0 08      02:27:08.946  SET FEATURES [Set transfer 
mode]
   ec 00 00 00 00 00 a0 08      02:27:08.940  IDENTIFY DEVICE
   c8 00 08 bf 91 3e ed 08      02:27:04.242  READ DMA

Error 111 occurred at disk power-on lifetime: 9939 hours (414 days + 3 
hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 08 bf 91 3e ed  Error: UNC 8 sectors at LBA = 0x0d3e91bf = 
222204351

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 08 bf 91 3e ed 08      02:27:04.242  READ DMA
   ec 00 00 00 00 00 a0 08      02:27:04.233  IDENTIFY DEVICE
   ef 03 45 00 00 00 a0 08      02:27:04.226  SET FEATURES [Set transfer 
mode]
   ec 00 00 00 00 00 a0 08      02:27:04.217  IDENTIFY DEVICE
   c8 00 08 bf 91 3e ed 08      02:26:59.438  READ DMA

Error 110 occurred at disk power-on lifetime: 9939 hours (414 days + 3 
hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 08 bf 91 3e ed  Error: UNC 8 sectors at LBA = 0x0d3e91bf = 
222204351

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 08 bf 91 3e ed 08      02:26:59.438  READ DMA
   ec 00 00 00 00 00 a0 08      02:26:59.430  IDENTIFY DEVICE
   ef 03 45 00 00 00 a0 08      02:26:59.422  SET FEATURES [Set transfer 
mode]
   ec 00 00 00 00 00 a0 08      02:26:59.417  IDENTIFY DEVICE
   c8 00 08 bf 91 3e ed 08      02:26:54.718  READ DMA

Error 109 occurred at disk power-on lifetime: 9939 hours (414 days + 3 
hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 08 bf 91 3e ed  Error: UNC 8 sectors at LBA = 0x0d3e91bf = 
222204351

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 08 bf 91 3e ed 08      02:26:54.718  READ DMA
   ec 00 00 00 00 00 a0 08      02:26:54.710  IDENTIFY DEVICE
   ef 03 45 00 00 00 a0 08      02:26:54.702  SET FEATURES [Set transfer 
mode]
   ec 00 00 00 00 00 a0 08      02:26:54.694  IDENTIFY DEVICE
   c8 00 08 bf 91 3e ed 08      02:26:49.994  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2246 
      -
# 2  Short offline       Completed without error       00%      1621 
      -
# 3  Short offline       Completed without error       00%      1605 
      -
# 4  Short offline       Completed without error       00%      1604 
      -
# 5  Short offline       Completed without error       00%         2 
      -
# 6  Short offline       Completed without error       00%         0 
      -

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

10:48:58 tom at 1520: ~$


Tom


More information about the colug-432 mailing list