Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 185509

Summary: Improve performance of the symbolic link recursion detection in UnifiedTree
Product: [Eclipse Project] Platform Reporter: Martin Oberhuber <mober.at+eclipse>
Component: ResourcesAssignee: Platform-Resources-Inbox <platform-resources-inbox>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: P3 CC: aleherb+eclipse, hashproduct+eclipse
Version: 3.3   
Target Milestone: ---   
Hardware: All   
OS: Unix All   
Whiteboard:
Attachments:
Description Flags
Patch to avoid unnecessary getCanonicalPath() on same parent none

Description Martin Oberhuber CLA 2007-05-04 05:48:16 EDT
+++ This bug was initially created as a clone of Bug #105554 +++

According to bug 105554 comment 53, the fix for symbolic link recursion detection in UnifiedTree is sensitive to slow operation, since the RefreshJob calling into it has a timeout of 2 seconds. If the task cannot be completed in that time, another RefreshJob is scheduled, losing contextual information about symbolic links and canonical paths that has been collected before.

According to bug 105554 comment 55, there are several options for addressing this:
  * Reducing the number of times getCanonicalPath() is called, by resolving
    the link target against the EFS URI internally
  * A UI Preference for suppressLinkedDuplicates (patch in attachment 65778 [details])
  * Deferring the handling of symbolic links to the very end of processing;
    unclear whether out-of-order handling would be OK for Visitors
  * Representing symbolic links as distinct items in the Resource model, 
    rather than blindly following them (bug 105554 comment 58, and bug 44107)

In the long run, the last option will be the best one since it gives user feedback for what is a symbolic link; avoids unnecessary file system operations for duplicate resources introduced by symbolic links; and makes any sophisticated recursion check algorithms unnecessary, since the link is just represented as a link.

In the shorter run for Eclipse 3.3, we should investigate reducing the number of times that getCanonicalPath() is called, since this also helps for remote EFS file systems and does not change the well-tested principal operation of the algorithm.

For measuring performance, see the test data in attachment 65204 [details] (attention: test4_complex_udev is the best one but not under EPL), and the instrumentation code in attachment 65775 [details].
Comment 1 Martin Oberhuber CLA 2007-05-04 07:22:42 EDT
Created attachment 65906 [details]
Patch to avoid unnecessary getCanonicalPath() on same parent

Attached patch is a very simple improvement to remember the canonical path of the previous parent if it has been computed.
This helps reducing the number of getCanonicalPath() calls in the case where multiple symbolic links are in the same parent directory.

For the test4_complex_udev from attachment 65204 [details], this gets rid of 34 unnecessary getCanonicalPath() calls, and reduces the calls on my machine from

Time: 125 msec, levels: 14 nodes: 664, Canonicals: 184

to

Time: 121 msec, levels: 14 nodes: 664, Canonicals: 150

measuring the third of three consecutive Refresh operations for warmup.
Comment 2 Martin Oberhuber CLA 2008-08-27 05:51:32 EDT
As per the resolution in bug 232426 comment 25, it looks like the actual performance issues that have been seen were due to the way how RefreshJob called into UnifiedTree multiple times.

There are still some known shortcomings:
  * On remote EFS-shared file systems, where getCanonicalPath() is not available,
    the algorithm does not work.
  * As per bug 105554 comment 41, there is a specific situation where a 
    non-recursive symbolic link is detected as a false positive.
  * Depending on the node on which a refresh operation is initiated, recursive
    symbolic link structures can lead to different resource structures being
    detected.
  * As per bug 209190, symoblic links are still not visually represented as
    symbolic links.

I think, though, that the solution we have now is sufficient in terms of Performance. Further improvements will likley need to make symbolic links first class citizens in the Resource model, and algorithms specifically designed for them. We are going to look at this in the context of
http://wiki.eclipse.org/E4/Resources (See also bug 44107, and bug 105554 comment 58).

Since the original reason for looking at Performance has been resolved with bug 232426, I'm marking this one as a duplicate.


*** This bug has been marked as a duplicate of bug 232426 ***