<div dir="ltr">I'm not sure of the latest implementation of tmpfs, but at least some of them still have the possibility of actual physical I/O (see <a href="https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt">https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt</a> ). In any case, you can reduce the likelihood, but the underlying problem still exists -- trying to enforce/rely on temporal ordering across the driver(s) and kernel for an ordering that generic Unix/Linux offers no guarantee.<div><div><br></div><div><div>File I/O in the kernel has buffer management, write/read scheduling/optimization, journaling, etc. and can lag behind the progress of the client process-- so much so that, as in this case, the client can sometimes "complete" its I/O and ask for the file's status before the kernel has actually completed processing of the original I/O; the stat() request asks "what is the true state of the file?" (which is potentially at a time when the I/O is still pending in the kernel); the kernel replies only with that it knows for sure at the time of the request. For a pathological case, imagine that the filesystem journaling repeatedly encounters a retryable disk write error-- it will eventually complete, but until it does, the state of the file is what the kernel has already been successfully written/journaled; that is, while this retry is happening, the state of the file[system] is still what it was before the still-being-journaled operation started.</div><div><br></div></div></div><div>To put it in transaction processing terms -- the file I/O is one transaction; the request for file size (via stat()) is a second transaction. There is no guarantee/feedback from the kernel about when the first transaction is complete; the only feedback is that the transaction has been queued. The probability of completion of the first transaction quickly approaches one as time passes, but there is never a 100% guarantee that the first transaction has completed by the time the second transaction is processed.<br></div><div><br></div><div>Note that this problem would be more likely to manifest itself in a multi-processor environment (where the kernel I/O was happening on one core and the user/client process was running on another core -- thus allowing the client to make progress while the kernel was still working).</div><div><br></div><div>Jeff</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 11, 2016 at 8:14 PM, Brian <span dir="ltr"><<a href="mailto:bnmille@gmail.com" target="_blank">bnmille@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">If Jeff is correct, you might get around the issue by doing the writes to a tmpfs RAM disk, rather than to a physical disk.</p>
<div class="gmail_quote"><div><div class="h5">On Jan 10, 2016 10:32 AM, "Jeff Frontz" <<a href="mailto:jeff.frontz@gmail.com" target="_blank">jeff.frontz@gmail.com</a>> wrote:<br type="attribution"></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jan 9, 2016 at 11:13 PM, William E. T. <span dir="ltr"><<a href="mailto:linux.hacker@gmail.com" target="_blank">linux.hacker@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>1. Opens a new file</div><div>2. Writes to the file</div><div>3. Closes the file</div><div>4. Creates a link to its real name</div><div>5. Removes the original file</div><div>6. stats the file to find a zero size</div></blockquote></div><br>Without knowing the details of the particular kernel or filesystem/RAID drivers involved, there is the potential for a race condition between 2/3 and 6.</div><div class="gmail_extra"><br></div><div class="gmail_extra">On Unix/Linux, there is no guarantee that data has ever made it to the "disk" (which could include the various levels of drivers/caches in between your program and the actual hardware).</div><div class="gmail_extra"><br></div><div class="gmail_extra">From the close(2) man page notes:</div><div class="gmail_extra"><br></div>" A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)"<p style="margin:0px;font-size:10px;line-height:normal;font-family:Monaco"><br></p><div class="gmail_extra"><br></div><div class="gmail_extra">The only guarantee of a non-zero size would be if you called fstat (note the "f") on the still-open file descriptor (used in 1/2/3 above).</div><div class="gmail_extra"><br></div><div class="gmail_extra">You can throw in a call to sync(2) to after the close(), though depending on how your I/O stack is composed even that might not be perfect.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Jeff</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div></div>
<br></div></div><span class="">_______________________________________________<br>
colug-432 mailing list<br>
<a href="mailto:colug-432@colug.net" target="_blank">colug-432@colug.net</a><br>
<a href="http://lists.colug.net/mailman/listinfo/colug-432" rel="noreferrer" target="_blank">http://lists.colug.net/mailman/listinfo/colug-432</a><br>
<br></span></blockquote></div>
<br>_______________________________________________<br>
colug-432 mailing list<br>
<a href="mailto:colug-432@colug.net">colug-432@colug.net</a><br>
<a href="http://lists.colug.net/mailman/listinfo/colug-432" rel="noreferrer" target="_blank">http://lists.colug.net/mailman/listinfo/colug-432</a><br>
<br></blockquote></div><br></div>