<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Top posting. Yeah, I know I'm a heathen.<br>
<br>
My initial thoughts are:<br>
<br>
1) hard links typically don't work across filesystem boundaries, and<br>
2) does the program check the exit code from the link command and
give appropriate error messages?<br>
<br>
I think I understand the symptom okay, but don't have good enough
visibility into the application and underlying system
configuration. It may be something simple, or it may be something
complex... and there's really not enough data for a complex problem
diagnosis.<br>
<br>
<div class="moz-cite-prefix">On 2016-01-09 23:13, William E. T.
wrote:<br>
</div>
<blockquote
cite="mid:CAOsF7x+fyjg=6MOtM6cCmDJx28jaf13bPfLLrmtKTNabdUhP=g@mail.gmail.com"
type="cite">
<div dir="ltr">At work, I rebooted the servers for one tier of an
application after which we started seeing errors.
<div><br>
</div>
<div>Based on a trace file(1), the vendor is sayings its a
problem with the OS. They found where the applications:</div>
<div><br>
</div>
<div>1. Opens a new file</div>
<div>2. Writes to the file</div>
<div>3. Closes the file</div>
<div>4. Creates a link to its real name</div>
<div>5. Removes the original file</div>
<div>6. stats the file to find a zero size</div>
<div><br>
</div>
<div>We would expect the file to have varying sizes based upon
the data that was written to it.</div>
<div><br>
</div>
<div>Since it is not practical for others to run this
application, I wrote an ugly C program(2) to mimc this
behavior. (ran via ./truncationbug $$.pre.tsidx $$.tsidx
sample.txt 1000 where sample.txt can be as simple as having a
three letter word in it) We have validated that this triggers
the bug, although it can take anywhere from 2,000 iterations
to millions and I have only triggered it if I am running
multiple instance in parallel (why I'm using $$ to grab the
pid; I can use the same command in multiple shells)</div>
<div><br>
</div>
<div>Things we have tried:</div>
<div>1. Booting an older kernel; with the reboot, they booted a
newer kernel</div>
<div>2. We tried re-kickstarting the servers; we switched
distributions to CentOS 7 this spring and they've ran fine
since then until this reboot, so the hope was to very quickly
revert to a known good configuration</div>
<div>3. Upgrading the firmware on the raid controller -- they
were running 3.04, 3.42, and 3.52 to 6.68</div>
<div>4. Running xfs_repair on the file system</div>
<div><br>
</div>
<div>Background Information</div>
<div>We have 30 servers at this tier; they are split evenly
between two different manufactures. The 15 HP Proliant 380p
with Smart Array P420i raid controllers are experiencing this
issue. They were effectively running CentOS 7.2; we've
reverted (via kickstarted reinstall) to 7.0 The volume is a
raid10 volume over 20 disks. I haven't been able to reproduce
the issue on the raid1 volume with the OS partitions, but I
haven't extensively tested it either. I've been able to
reproduce the problem under others users including root.</div>
<div><br>
</div>
<div>At this point HP support is saying they see no indications
of a hardware issue and saying they only support hardware.</div>
<div><br>
</div>
<div>I really appreciate any thoughts or suggestions.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Bill</div>
<div><br>
</div>
<div>1. <a moz-do-not-send="true"
href="http://pastebin.com/mFmZzYEL" target="_blank">http://pastebin.com/mFmZzYEL</a></div>
<div>2. <a moz-do-not-send="true"
href="https://gist.github.com/w3ttr3y/167b349d2ab67e3aa9d2"
target="_blank">https://gist.github.com/w3ttr3y/167b349d2ab67e3aa9d2</a></div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
colug-432 mailing list
<a class="moz-txt-link-abbreviated" href="mailto:colug-432@colug.net">colug-432@colug.net</a>
<a class="moz-txt-link-freetext" href="http://lists.colug.net/mailman/listinfo/colug-432">http://lists.colug.net/mailman/listinfo/colug-432</a>
</pre>
</blockquote>
<br>
</body>
</html>