[colug-432] I/O Error?
Thomas W. cranston
thomas.w.cranston at gmail.com
Thu Dec 6 11:51:06 EST 2012
On 12/06/2012 10:15 AM, Bill Schwanitz wrote:
>
> On Dec 6, 2012, at 10:26 AM, Thomas W. cranston<thomas.w.cranston at gmail.com> wrote:
>
>> I don't Know. Perhaps I do not know how to interpret the data.
>>
>> Looking at the man right now.
>>
>> What are you getting at?
>>
>> Tom
>
> Tom,
>
> If you want email the output of smartctl -a<dev> to the list.
>
> I put *** in front of the ones you really want to look at. Note, not all drives have the same attributes.
>
> I will also echo the comment from Rob Funk - if you have important stuff on that drive I'd make a backup of that data/get it off of that disk while you still can. Look towards the end of this email to see a really dead drive logs for comparison.
>
> ╰─○ sudo smartctl -a /dev/sdb<<<
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-71.29.1.el6.x86_64] (local build)
> (snip)
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 060 052 006 Pre-fail Always - 4891588
> 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
> 4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4758
> *** 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 103678017425
> 9 Power_On_Hours 0x0032 050 050 000 Old_age Always - 44500
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 445
> 194 Temperature_Celsius 0x0022 035 058 000 Old_age Always - 35 (0 14 0 0 0)
> 195 Hardware_ECC_Recovered 0x001a 060 051 000 Old_age Always - 4891588
> *** 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
> *** 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
> *** 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
> *** 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
> 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Extended offline Completed without error 00% 42335 -
> # 2 Extended offline Completed without error 00% 30238 -
> # 3 Extended offline Completed without error 00% 21211 -
> # 4 Extended offline Completed without error 00% 19939 -
>
> -- really dead drives --
>
> ( note, these are different hosts/drives, examples only but real )
>
> smartctl output
>
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> *** 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2218
> 2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 8480
> 3 Spin_Up_Time 0x0023 071 070 025 Pre-fail Always - 8984
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
> 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
> 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8360
> 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
> 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 24
> 13 Read_Soft_Error_Rate 0x003a 100 100 000 Old_age Always - 0
> 191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
> 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
> 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
> 194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 24 (Lifetime Min/Max 16/33)
> 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
> 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
> *** 197 Current_Pending_Sector 0x0032 002 002 000 Old_age Always - 8143
> *** 198 Offline_Uncorrectable 0x0030 252 093 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
> 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
> 240 Head_Flying_Hours 0x0032 100 100 000 Old_age Always - 8360
> 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 16766
> 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1913
> 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 1
>
> $ dmesg
>
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481872
> sd 4:0:0:0: [sde] Unhandled error code
> sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 90 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481872
> sd 4:0:0:0: [sde] Unhandled error code
> sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 4:0:0:0: [sde] CDB: Read(10): 28 00 6e 79 e3 98 00 00 08 00
> end_request: I/O error, dev sde, sector 1853481880
> sd 4:0:0:0: [sde] Unhandled error code
>
> email from smarts
>
> The following warning/error was logged by the smartd daemon:
>
> Device: /dev/sdc [SAT], FAILED SMART self-check. BACK UP DATA NOW!
>
> For details see host's SYSLOG (default: /var/log/messages).
>
> You can also use the smartctl utility for further investigation.
> No additional email messages about this problem will be sent.
>
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
I have moved the data I want to another storage device. Shopping for new
HDD now.
10:48:36 tom at 1520: ~$ sudo smartctl -a /dev/sda | more
[sudo] password for tom:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: FUJITSU MHY2120BH
Serial Number: K430T832NAD0
Firmware Version: 0085000B
User Capacity: 120,034,123,776 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 3f
Local Time is: Thu Dec 6 10:48:49 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 487) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 69) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 095 046 Pre-fail
Always - 182587
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail
Always - 1
4 Start_Stop_Count 0x0032 098 098 000 Old_age
Always - 50417
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail
Always - 8589934592000
9 Power_On_Hours 0x0032 081 081 000 Old_age
Always - 599649
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 4173
191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age Always
- 747
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 154
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always
- 36 (Lifetime Min/Max 5/49)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 562860
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always
- 20052602454019
198 Offline_Uncorrectable 0x0032 100 100 000 Old_age Always
- 126270966595587
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always
- 15481508
200 Multi_Zone_Error_Rate 0x0032 100 100 000 Old_age Always
- 45407574
SMART Error Log Version: 1
ATA Error Count: 113 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 113 occurred at disk power-on lifetime: 9989 hours (416 days + 5
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e e0 Error: UNC 8 sectors at LBA = 0x003e91bf = 4100543
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 40 87 91 3e e0 00 01:27:16.122 READ DMA EXT
25 00 20 47 8f 3e e0 00 01:27:16.114 READ DMA EXT
25 00 20 7f 8b 3e e0 00 01:27:16.107 READ DMA EXT
25 00 20 e7 89 3e e0 00 01:27:16.105 READ DMA EXT
25 00 08 7f 89 3e e0 00 01:27:16.104 READ DMA EXT
Error 112 occurred at disk power-on lifetime: 9939 hours (414 days + 3
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e ed Error: UNC 8 sectors at LBA = 0x0d3e91bf =
222204351
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 bf 91 3e ed 08 02:27:08.962 READ DMA
ec 00 00 00 00 00 a0 08 02:27:08.953 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 08 02:27:08.946 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 08 02:27:08.940 IDENTIFY DEVICE
c8 00 08 bf 91 3e ed 08 02:27:04.242 READ DMA
Error 111 occurred at disk power-on lifetime: 9939 hours (414 days + 3
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e ed Error: UNC 8 sectors at LBA = 0x0d3e91bf =
222204351
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 bf 91 3e ed 08 02:27:04.242 READ DMA
ec 00 00 00 00 00 a0 08 02:27:04.233 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 08 02:27:04.226 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 08 02:27:04.217 IDENTIFY DEVICE
c8 00 08 bf 91 3e ed 08 02:26:59.438 READ DMA
Error 110 occurred at disk power-on lifetime: 9939 hours (414 days + 3
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e ed Error: UNC 8 sectors at LBA = 0x0d3e91bf =
222204351
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 bf 91 3e ed 08 02:26:59.438 READ DMA
ec 00 00 00 00 00 a0 08 02:26:59.430 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 08 02:26:59.422 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 08 02:26:59.417 IDENTIFY DEVICE
c8 00 08 bf 91 3e ed 08 02:26:54.718 READ DMA
Error 109 occurred at disk power-on lifetime: 9939 hours (414 days + 3
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 bf 91 3e ed Error: UNC 8 sectors at LBA = 0x0d3e91bf =
222204351
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 bf 91 3e ed 08 02:26:54.718 READ DMA
ec 00 00 00 00 00 a0 08 02:26:54.710 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 08 02:26:54.702 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 08 02:26:54.694 IDENTIFY DEVICE
c8 00 08 bf 91 3e ed 08 02:26:49.994 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2246
-
# 2 Short offline Completed without error 00% 1621
-
# 3 Short offline Completed without error 00% 1605
-
# 4 Short offline Completed without error 00% 1604
-
# 5 Short offline Completed without error 00% 2
-
# 6 Short offline Completed without error 00% 0
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
10:48:58 tom at 1520: ~$
Tom
More information about the colug-432
mailing list