| Summary: | tptpFileTransferAgent - Deletion and transfer problems | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Pavel Pravda <pavel.pravda> | ||||||
| Component: | TPTP | Assignee: | Jonathan West <jgwest> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||||
| Severity: | major | ||||||||
| Priority: | P2 | CC: | igor.alelekov, jgwest, jiri.bjalek, kathy, kiryl.kazakevich, pjkrief, samwai | ||||||
| Version: | unspecified | Keywords: | plan | ||||||
| Target Milestone: | --- | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows XP | ||||||||
| Whiteboard: | housecleaned461 closed471 | ||||||||
| Attachments: |
|
||||||||
|
Description
Pavel Pravda
Because of this issue I have done some tests of standard TPTP FileTransferAgent and unfortunately it behaves non-deterministic for us. It requires tuning of sleeps between commands - not a really solution. I have discussed this issue in TPTP newsgroups. There are some sample codes based on examples I have found. http://www.eclipse.org/newsportal/article.php?id=5130&group=eclipse.tptp#5130 http://www.eclipse.org/newsportal/article.php?id=5174&group=eclipse.tptp#5174 We are developing custom testing application and we had to implement dirty work-around which currently degrades our functionality to localhost use only. We would really appreciate rescheduling of this bug for the earliest version possible because it affects our business plans heavily. Jonathan, would you please see if you can reproduce this defect with 4.4.1? Thanks.
>>> After some experimenting with the code I added Thread.sleep(1000) statements to get this running properly. Without these sleeps one or both files were not deleted.
>>> My problem is that I don't understand why it doesn't work without sleeping the thread because I didn't find any info in docs or in examples about it.
The reason for this is that from the client's perspective, the file transfer is complete when it sends its last bit of data to the file transfer agent. Once the last bit of data is sent, the control flow is returned to the client's main program, and the client will then run the delete file code, which sends the delete command to the file transfer agent.
However, just because the client believes the transfer is complete, that does not mean that the agent is finished writing the file: There is a space of time in between when the client sends the last bit of data, and when the file transfer agent has finished writing it. In this window, the file transfer agent is waiting to receive the data, write it to the file, request and receive a lock on the file list, close the file, and end the thread, before it can properly handle additional commands to use that file.
Thus, the client will send the delete command while the agent is still writing/closing the file. The agent _will_ process the delete command, but its attempt to delete the file will fail, because the file is still in use (by another thread).
The short answer to your question is, whether or not the deletion succeeds is a race condition, with the outcome strongly favouring a failure to delete (without some kind of sleep statement after the transfer)
(See deletion diagram attachment)
This registers as a "FileTransferAgent deleteFile system error" in the serviceconfig.log file, which is the direct result of the failure of the 'remove(...)' c function to delete the file. Unfortunately the agent does not communicate the failure to the client.
Created attachment 83788 [details]
Deletion diagram
Able to reproduce with 4.4.1. Created attachment 83815 [details]
Deletion Diagram - Updated
Updated diagram terminology
Igor, I've got a problem with the shared memory code in solving this bug, let me if you think you can solve it: My solution to this bug is, when the client sends a request to the server, the client should wait for the server to reply that the file has transferred successfully before it returns execution to the client code. However, upon implementing this, the time to transfer my test suite of files increased _six-fold_. This was caused by a one second delay between when the client sent the last bit of data, and when the file transfer agent acknowledged that it had received it. I tracked the problem down to the shared memory code: The processData(...) function of DataProviderImpl is the function which is called with data. That function is called by sharedMemDataPathProcessorFunc(...), which is called by ossRamboFlushToFunc(...) in ossramboflush.cpp. It would seem that the data in shared memory is not flushed to the function to handle it (by the ossRamboFlushToFunc function) for a full second after it has received it. This is because the code doesn't flush the data until either the chunk is filled (a chunk is 128k), or for a certain amount of time to elapse (10 msec x 100 = 1 second). Thus, because the files being transfered are less than 128k, the completion of the file transfer will not be observed by the agent for a full 1000 msec. While this may not seem like a lot, this is in fact a delay of one second PER FILE. I think it is unacceptable that 100 small files will take a minute and a half to transfer. In my Windows directory (c:\windows\), for instance, there are 338 files, ALL of them under 128k, with a combined total of 30 MB. This transfer would take 338 seconds, or about 33.8x times slower than a file copy in windows (10 seconds). Have you seen this data transfer latency elsewhere, Igor, and would you be able to make a change to the shared memory code to fix the problem? The shared memory code is somewhat impenetrable... Reassigning to Stas Deferral to future with PMC approval Mass update of P1 enhancements and defects targetted to future to P2. As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. Since this defect is more than 2 years old, it may be no longer relevant. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this defect is resolved as WONTFIX. If this defect is still relevant and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open. As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open. |