Community
Participate
Working Groups
An Eclipse project contains symbolic links as its child resources. If such symbolic link points to its ancestor directory, when the workspace refreshing is invoked (by pressing F5 key in the resource navigator, for example), Eclipse is killed by 'java.lang.OutOfMemoryError: Java heap space'. This is caused by the infinite loop execution of FileSystemResourceManager.refreshResource() --> UnifiedTree.accept()
See also bug 42438.
This is even worse when having "automatic refresh" in action. A real showstopper, IMHO.
Having cycles in your file system is a show-stopper - if it hurts, don't do it.
Hmm, various other applications deal quite happily with it. A symlink that points to the containing directory is fine with all *nix filesystems and that's why one can do it. If an application does not deal with all normal situations a file system can be in, then the application is buggy. From my endusers point of view eclipse lacks appropriate deadlock prevention when dealing with the filesystem.
(In reply to comment #3) > Having cycles in your file system is a show-stopper - if it hurts, don't do it. having cycles in your file system is a perfectly normal and acceptable practice. Applications should use realpath to canonicalize directories and ensure there is no recursion before enumerating thier contents. I found this report because I tried adding a project containing a cyclic link, it was unsuccessful while other IDEs and editors have no issue with cyclicity.
(In reply to comment #5) > (In reply to comment #3) > > Having cycles in your file system is a show-stopper - if it hurts, don't do it. > > having cycles in your file system is a perfectly normal and acceptable practice. > Applications should use realpath to canonicalize directories and ensure there > is no recursion before enumerating thier contents. I can only agree! You will find such symlinks on nearly any linux system e.g. in /usr/include. Without starting a discussion if this practice is good or not the fact is that eclipse will have to handle such situations. BTW: it is not an "infinite loop" it will stop after 40 iterations but this also too much :-) We will use a fix by changing the native stat impl for linux systems in libcore_3_1_0.so. There we check if the path is a link to "." and if yes this path will be skipped. If you are interessted we can provide the patch. But of course the real fix would be to check the path for such situations in the java part with calls like getCannonicalPath.
Sorry, my comment #3 was a bit harsh - I think I was just reacting to the comment that the bug was a show-stopper (comment #2), which seemed odd because after four years of Eclipse being around this was the first bug on the subject. I agree it is a bug, and it should be fixed. The suggested fix in comment #6 seems insufficient, since a sym-link to the parent would presumably also create such a loop.
(In reply to comment #7) > I agree it is a bug, and it should be fixed. The suggested fix in comment #6 > seems insufficient, since a sym-link to the parent would presumably also create > such a loop. Sure it does only fix a special problem that eclipse currently has. But "real" symlink loops are not the normal case and are normally the result of an error of the user or the system setup. As where links to "." are pretty "normal" and are IMHO not real "loops". But the best fix would be of course in the resource layer :-)
This does not really belong to this bug but the reason is the same The "import from file system" wizard has also some problems when importing sources which contain symlinks to "." It will create 40 iterations of the folder structure as real folders before stopping. Should I create a separate bugzille entry for this?
(In reply to comment #9) > This does not really belong to this bug but the reason is the same > The "import from file system" wizard has also some problems > when importing sources which contain symlinks to "." > > It will create 40 iterations of the folder structure > as real folders before stopping. A fix would be possible in FileSystemStructureProvider getChildren() with getCannonicalFile a link to the current folder could be detected and ignored. But this is also just symptom fixing. The best solution for any recursive file system structure traversal would be to collect the possible folders in front, check them for loops etc. and iterate just over the "cleanuped" list.
I'd like to reiterate that this is a real showstopper if you are trying to import filesystems that contain such links
This problem isn't only restricted to UNIX. For example I'm using Eclipse CDT on Windows against a project in a ClearCase dynamic view hosted on Solaris. It's a serious obstacle to the take up of Eclipse on our team. Mainly because it occurs when people first try Eclipse, giving them a very bad first impression. I'm aware that this is a more complex problem. I don't think getCanonicalFile will help us here as the information about link being a link is only available at the ClearCase level. I don't think it's even exposed at the SMB level. A great compromise would be if I could get the resource manager to just exclude certain names when it does the refresh. It isn't enough to just exclude at the C/C++ Project Paths->Source Tab->Exclusion Filter level.
*** Bug 131461 has been marked as a duplicate of this bug. ***
I would very much like it if Eclipse's resources layer would handle all symlinks in an intelligent fashion. I'm writing a plugin, and I wanted to make it remember a path in the workspace in the form of a symlink, but I realized this would confuse the resource system if the path went up. When Eclipse's filesystem scanner finds a symlink, it could report the symlink as a linked resource; that would fix the bug if Eclipse already has logic that avoids cycles of linked resources.
Created attachment 40924 [details] One way to solve the endless loop during refresh for recursive symlinks This is a patch against 3.2RC3 It is a change in RefreshLocalVisitor which remembers folders that have already been visited in the refresh operation. File.getCanonicalPath is used to get the unique path. The rest of the change is adding a checkbox in the workspace preferences to enable this check. The default value is false. This does NOT fix the same problem in the file system import wizard but a similair approach can be used there too
*** Bug 145200 has been marked as a duplicate of this bug. ***
This bug also manifests itself in apparent infinite loops running batch audits of codebases with symbolic links in them. Instantiation's CodePro AnalytiX uses the Eclipse compiler to parse the code. Symbolic links get expanded and reexpanded in the bin directory of the temporary audit workspace. For example, if your source folder has org/eclipse and that folder has a symlink resources -> ../.. then you will see org/eclipse/orgeclipse/org/eclipse/org/eclipse ... created in the bin directory. I did not verify if this stops after N levels, however, but the operation did appear to be looping.
Will this patch be considered for 3.2.1?
Created attachment 48520 [details] Updated patch This is an update on the patch in comment #15. I haven't tested it yet - just posting here as a work in progress. This patch moves the recursive check out of RefreshLocalVisitor and into UnifiedTree. If you fix only the refresh local case, you'll still get the infinite loop for other clients of the UnifiedTree (this is a tree that simutaneously does a breadth-first traversal of the local file system and the resource tree). I also believe the previous patch would have scaled poorly to workspaces with thousands of folders (not uncommon).
Great that there is progress on this problem. I am really not an expert with this aerea of the code so I am glad to learn from you :-) If you like I can test it as soon you say it makes sense
Created attachment 48580 [details] Updated patch The previous patch didn't catch all cases. I have been testing this latest patch and it seems to work well. If anyone else wants to give it a try it would be good to hear your feedback.
Tested it today: Applied patch against 3.2 -Use "Import archive wizard" to import a tgz which contains recursive symlinks in an existing project -> works (recursive symlinks are ignored) -Extract same tgz on cmdline in an existing simple project and use refresh -> hangs until out of memory as before
I just looked again at your patch. It is not enough just to check the parent<->child relation because recursive symlinks can spawn more then one directory level. And that will not be discovered from this patch
Can you give a concrete example where it doesn't work? If you have a link at /a/b/c that links to /a, then the prefix check should still work. A simple failing example to test against would be helpful.
Created attachment 50494 [details] recursive symlink example
Of course you must extract the archive on a unix/linux host :-)
*** Bug 161015 has been marked as a duplicate of this bug. ***
*** Bug 161014 has been marked as a duplicate of this bug. ***
Created attachment 52979 [details] Updated my way to handle recursive symlinks Bugfix to included already existing folders in the visited list
This is certainly a hard problem, especially when the recursive link is in a ClearCase view accessed on Windows (Comment #12). File.getCanonicalPath() fails in this case. The only possible solution I can see is when the getChildren() call computes a hash code of the child list (including child basenames, length, access time, attributes from EFS/IFileInfo), and in case exactly the same children are found again break the recursion (with a visited list as in Comment #29). Such an approach would be a bit similar to the way how cddb identifies audio CD's by computing a hash code from the TOC, length of the songs etc and it seems to work reasonably well. For the more common case where symbolic links can be really identified as symbolic links by the underlying file system, EFS now has API to identify them when the native file library is loaded (bug 170317: EFS.ATTRIBUTE_SYMLINK, currently implemented on Linux only). I think that if a solution is desired that avoids cyclic links in all possible cases, there is no way around a visited list since the recursion may involve any number of symbolic links. A->. is just the simplest case, A.b->B with B.a->A is the mutually recursive case, A.b->B, B.c->C, C.a->A involves three folders and so on -- one can dream up recursion involving any number of links / folders involved. I think the solution from comment #29 is close to perfect, just given that it fails for the ClearCase case (and java.io.File.getCanonicalPath() may be expensive), a solution including a hashCode of child infos might be worth a thought. A simple solution could involve an ArrayList as follows: static class UniqueResourceId { private String names; private long modTime; private int type; public UniqueResourceId (String name, long modTime, int type) { this.name=name; this.modTime=modTime; this.type=type; } public boolean equals(Object other) { if (other instanceof UniqueResourceId) { UniqueResourceId o = (UniqueResourceId)other; return modTime==o.modTime && type == o.type && name.equals(o.name); } return false; } public int hashCode() { return name.hashCode(); //just hash the name } } static class FreezableArrayList extends ArrayList { private boolean frozen=false; private int hashCode; public void freeze() { this.hashCode=super.hashCode(); frozen=true; } public int hashCode() { if(frozen) return this.hashCode; return super.hashCode(); } } Object computeUniqueFolderId(IContainer folder) { IResource[] children = folder.members(); FreezableArrayList uuid = new FreezableArrayList(children.length); for(int i=0; i<children.length; i++) { IResource c=children[i]; uuid.add(new UniqueResourceId(c.getName(), c.getLocalTimeStamp(), c.getType()); } uuid.freeze(); return uuid; }
I update a little base the attachement of Comment #29 in order to resolve the relative link RefreshLocalVisitor.java checkFolder method String s = p.toFile().getCanonicalPath(); String s1 = p.toFile().getPath(); if (s1.length() > s.length() && s1.startsWith(s)) { return false; } else if(!fVisitedList.contains(s)){ fVisitedList.add(s); } else { return false; }
(In reply to comment #31) Ah, you are taking a shortcut in case the symbolic link just points up in the folder hierarchy e.g. myself->. or parent->.. Was your intention just performance optimization or something else? I think your code doesn't correctly cover the prefix case where you sit in a folder /path/to/folder and it contains a symbolic link parallel -> ../fol which points to the folder /path/to/fol Your code wouldn't refresh /path/to/fol. I believe this could be fixed by appending File.separatorChar to the canonicalPath before doing your prefix check. With respect to my previous proposal in comment #30, an even better return value for the method computing the unique identifier would be (since direct access to elements is no more needed for the UID): static class UIDFromList { private int hashCode; private List list; public UIDFromList(List l) { this.list=l; this.hashCode=l.hashCode(); } public int hashCode() { return this.hashCode; } public boolean equals(Object other) { if (other instanceof UIDFromList) { return this.list.equals(((UIDFromList)other).list); } return false; } }
I have two criteria for a solution to this bug: 1) Must have trivial or no performance impact when no symbolic links are present. 2) Must be solved in UnifiedTree, rather than in RefreshLocalVisitor. As mentioned earlier, all the current patches put the fix in RefreshLocalVisitor. However, if some other visitor walks over the UnifiedTree in this case, they will hit the same infinite recursion (DeleteVisitor and CopyVisitor being prominent examples). I think now that we have native symlink support in EFS, there is an opportunity to satisfy 1). I.e., we can detect symlinks at trivial cost since the attribute will be fetched anyway as part of the IFileStore#fetchInfo call. Off the top of my head here's a possible solution: - UnifiedTreeNode has a "visited" field, which is a Set of IFileStores visited already. By default this field is "null", so it adds trivial extra cost for non-symlink case. - When we go to fetch children of a node in the tree (UnifiedTree.addChildren), check if it is a sym link. If it is, we initialize the visited field with a new Set. If the visited field is non-null, add the current node's file store to the visited set before proceeding to add children. - As we create child nodes, check if it is already in the visited set. If so, don't create a node for that child. - For each child node that is created, pass along the visited set (or null) from the parent. With such a solution I believe we could throw away the UI portion of this patch - no need to add a preference if the performance is good in the normal case. The only problem this doesn't solve is the case where the symbolic link is not detectable via the IFileStore native (ClearCase on Windows mounting a Linux drive, for example). I don't think we should be hung up on this solving problem right now.. I would much rather solve the majority case of plain symlinks on *nix/Vista machines. It would then be possible to solve the CC problem via a specialized IFileStore implementation, for example, that knows how to detect symlinks for that case. If someone wants to take a crack this, be my guest. If not, I'll try to find time to code this up in the 3.3 M6 period.
Do you know of an existing unit test to check the performance impact (and probably also memory consumption) of this? Unchanged M5, and the existing patch could be the cornerstones of a performance check.
Created attachment 58388 [details] Performance test This patch adds a performance test for refreshing a large project. Tweak the suite() method to enable only the single refresh test. I'll release this after M5 so we have a baseline.
*** Bug 172160 has been marked as a duplicate of this bug. ***
(In reply to comment #33) > If someone wants to take a crack this, be my guest. If not, I'll try to find > time to code this up in the 3.3 M6 period. With M7 close, is it realistic that you find time for this? Or until what time would you accept a good patch? - Max Weninger has submitted two patches already which are actually part of our product based on Eclipse, but have not been accepted into the Platform... > I think now that we have native symlink support in EFS, there is an > opportunity to satisfy 1). I.e., we can detect symlinks at trivial cost yes. > If the visited field is non-null, add the current node's file store to > the visited set before proceeding to add children. > - As we create child nodes, check if it is already in the visited set. I think we need to put java.io.File#getCanonicalPath() into the "visited" Set, and also check for the canonical path of each folder when drilling further down, since the path we get from the IFileStore will be deeper and deeper even if the symbolic link points to itself. Since getting the canonical path is typically expensive (and needs to be done for every folder drilling down, as soon as we stepped over a symbolic link), I still think it's a better idea to compute a hash code from each folder's children and attributes - as proposed in comment #32. Then, we need to get the canonical path only if the hashcodes are the same. > The only problem this doesn't solve is the case where the symbolic link is not > detectable via the IFileStore native Yes - but in such a situation also getCanonicalPath() does not help. I agree that it is OK to live with this limitation for now.
Yes, it's unlikely I'll get to this in 3.3.
I'm working on this now and will try to come up with a good patch.
Created attachment 65204 [details] Test projects for symbolic links. These break Eclipse 3.3M6!! Attached archive contains test projects for all possible cases of symbolic links I could think of. It also includes the "udev" real-world example from attachment 50494 [details] (and obsoletes it). These are the projects I've tested my implementation against. There are 6 test projects, which can be imported into Eclipse (extract the archive on a UNIX platform, then choose File > Import > Existing Projects into Workspace). Two other folder hierarchies are also needed to test recursive symbolic links pointing outside the projects known to the workspace. Attention: DO NOT IMPORT test1..test5 INTO ECLIPSE 3.3M6 or earlier. It will fail with an OutOfMemoryError during refresh, since these test cases all contain recursive symbolic links. Also, DO NOT import these projects on a platform that supports symbolic links but where Eclipse does not know about them (currently MacOS X, HP-UX, AIX, Linux x86-64, or an SMB mounted share on Windows). test6 is a special project: it does not contain recursive symbolic links and should thus work on any Eclipse installation. This test is there to check for a specific small shortcoming of the implementation (to be attached next). Legal Message: I, Martin Oberhuber, declare that I developed attached code from scratch, without referencing any 3rd party materials except material licensed under the EPL. I am authorized by my employer, Wind River Systems Inc., to make this contribution under the EPL.
Created attachment 65205 [details] Patch fixing all recursive symbolic link issues in UnifiedTree Attached patch fixes all problems shown by all known symbolic link test cases as of the attachment to comment #40. It follows the rules established by John in comment #33: * Entire fix is in UnifiedTree * Special handling ONLY for symlinks pointing to folders (as detected by EFS) Thus there is guaranteed trivial extra cost, and no change of behavior in the non symlink case. Because EFS symlink detection is used, this fix only works when a liblocalfile native library fragment is installed that supports detecting symbolic links. Currently, this is the case for Linux-x86, Linux-PPC, and Solaris-SPARC. Other platforms are not affected by this fix and will still break. Re-compiling the liblocalfile for other UNIX platforms is trivial, though; MaxOS X may be harder since it has a different code base. The fix detects also mutually recursive symbolic links (a->b, b->c, c->a) by maintaining a history of symbolic link parents during traversal of the tree. Because this history is not stored after traversal is done, and because traversal is done in a breadth-first-search manner, the fix has two "strange" effects: 1.) In case of mutually recursive a symbolic links, refreshing on level 0 has a different result than refreshing on lower levels - lower levels reveal one level more of the recursive link loop, while refreshing on level 0 reveals only the first one. This can be shown by refreshing on various levels in project test1. 2.) There is one specific case where even non-recursive symbolic links show fewer duplicate resources now, than it was before the fix. Look at test6 from the attachment to comment #40 for details. Both these "strange" results appear in very rare scenarios, so I think these are acceptable taking the benefit of the fix into account. Fixing them would require a lot more effort, most probably including things like keeping track of previous locations while following links and keeping this history in sync with workspace changes; resource tree lookups during tree traversal; deferring symbolic link evaluation in the BFS queue until all non-symbolic-links are evaluated. Summing up, the patch has the following properties: * it works with a symlink-enabled EFS provider only. * with EFS enabled on localstore, it fixes all known recursion issues. * recursive symbolic links are completely hidden from the resource tree. * it NEVER leads to missing unique resources that would otherwise be shown. * it MAY suppress duplicate resources under very unlikely conditions only. * in case of recursive links only, it MAY lead to different results during refresh, depending on where a query was started, with the difference being that dupliates are either shown or hidden. When a recursive symbolic link is detected, it is completely hidden (as if it would not exist). I think that this is a potential enhancement for the future -- displaying symbolic links like linked resources in the resource tree, and handling them accordingly. The patch includes a testcase for the new utility class (PrefixPool) but no automated test for the various symbolic link scenarios. I think that converting the tgz archive of the test projects into automated tests should not be too hard, the test cases could simply count the number of resources after a full refresh; I was just not sure how to best add this. Of course, the test could only run on EFS-symlink-enabled-platforms. Because the code apparently does no work in the non-symlink-case, I also did not check the performance test.
(In reply to comment #41) Legal Message: I, Martin Oberhuber, declare that I developed attached code from scratch, without referencing any 3rd party materials except material licensed under the EPL. I am authorized by my employer, Wind River Systems Inc., to make this contribution under the EPL.
Thanks Martin. Targetting to review and release for M7 unless I find problems. Also marking this as a "greatbug" because of comment #40 and comment #41, which is an exemplary example of a contribution, including detailed documentation and test cases.
Some questions: 1) Why are the parentPath and childPath both inserted into pathPrefixHistory on each iteration? Won't we eventually find all loops if we only insert the childPath? I must be missing something, but this seems like we have double the necessary number of calls to File.getCanonicalPath. 2) suppressLinkedDuplicates field is always false. Can we just remove this field and leave in the TODO comment about potential future changes? 3) My regex understanding is probably rusty, but the pattern "\\.[./]*" seems incorrect. To me this reads "dot, followed by any sequence of either characters or front slashes". Won't this match a folder name that starts with a dot, such as ".metadata"? Is this intended to be parent references (../), or current references (./) ? 4) In initLinkHistoriesIfNeeded, it adds both the root resource's location, and the location of the root resource's project. I think this doesn't account for linked resources in the workspace. Example: - Project /P1 has location /home/johna/p1 - Linked folder /P1/Folder has location /home/johna/folder - Child folder /P1/Folder/SymLink is a symbolic link /home/johna/folder/SymLink to /home/johna/p1. -> This isn't a cyclic symbolic link chain, since the symlink does not point to a parent of the symlink. I think rather than adding the project location, this code should add the location of the nearest linked resource, or the project location if not nested within a linked resource. If you agree, I can fix this up because I have code elsewhere the computes the "file system root" for a given resource that can be used here. Or even simpler, it might just work with adding the root resource's location, and eventually via traversal we will catch any loops that lead back to the root resource.
By the way, despite my comments I am optimistic we can solve this in M7. I agree this has no impact on non-symlink cases, and even if we don't catch all possibilities right away, we'll be better off than the existing behaviour.
(In reply to comment #44) Let me answer the simple ones first: > 2) suppressLinkedDuplicates field is always false. Can we just remove this > field and leave in the TODO comment about potential future changes? Sure, you are right. Initially, I thought the intention of this were clearer with the extra field, and would allow for programmtic modification of the behavior. But now I think that suppressing duplicate resources is typically problematic since not backward compatible so the field should be removed keeping only a short comment. > 3) the pattern "\\.[./]*" seems incorrect. No, this is correct. Inside the character class [] every character stands for itself and you don't need to quote. [\\./] would also match a String like "..\../.." which may be interesting when considering symbolic links on Vista, though so if you feel better I'm absolutely fine with such a change (anticipating that some 3rd party would write an EFS provider that supports symlinks on vista). Now for the harder ones: > 1) Why are the parentPath and childPath both inserted into pathPrefixHistory? > Won't we eventually find all loops if we only insert the childPath? This is a tricky question, and I'd like to sleep over it :-) My initial concern with this is, that the childPath is currently NOT inserted into the pathPrefixHistory since a link is only recursive if it walks back to where it came from at some point. I think you'd * get more false positives by adding the child path, since when two folders a->/link and b->/link point to the same resource (which is not recursive per se) the second occurrence would necessarily be treated as "recursive" and thus suppressed. Right now, this happens much more rarely (only if a b1/b2->/link points to a link that is a prefix of a path that has a link parent at a higher level). I guess your thinking is that if it's really a loop, it doesn't matter whether the loop is detected sooner or later and the loop must come across the same place at some time. That's certainly true, but the problem is that we are doing a breadth-first search and not a depth-first search... and I cannot see how we could remember the history of previous links on a resource "tree" (though this would be a good thing). If we could switch from BFS to DFS when a symbolic link is encountered, and clear the link history when this happens, this would be the best solution (and also fix all the other "strange" issues!!). Can we do that? I think I'm not familiar enough with the queue and how inserting works, but this might be an option... At any rate, remember that the getCanonicalPath() only happens for symbolic links so it does not impact the non-symlink-case, AND the solution is also better than all previously proposed patches which did getCanonicalPath() for every folder (and not just symlinks). Your suggestion would only be a linear (*2) improvement for the symlink case and I think that's not worth the risk mentioned above. If we cannot switch from BFS to DFS, I suggest keeping it as-is for now and probably shooting for other improvements in the future. I see room for improvements by trying to do more interpretation on the link target (trying to resolve it internally and do without any getCanonicalPath() at all) in the future, with getCanonicalPath() as a very last resort only later. This would also work better on remote file systems. But it seems harder and less robust so I'd prefer thinking about this later. > 4) In initLinkHistoriesIfNeeded, it adds both the root resource's location, > and the location of the root resource's project. I think this doesn't account > for linked resources in the workspace. Example: > -> This isn't a cyclic symbolic link chain, since the symlink does not point > to a parent of the symlink. right, and I think this is essentially the BFS versus DFS issue again. Adding better or more "file system roots" as you suggest is certainly a good think and cannot hurt, I think. I'm afraid though, that because of the BFS the positive effect is probably smaller than you expect. Currently, the "spanning filesystem locations" are remembered in separate PrefixPool just in order to ensure that a link pointing to a prefix of a previously found link root is NOT suppressed if it is outside a file system "root". So, adding more file system "roots" would lead to suppressing MORE symbolic links and thus producing more false positives, I think. The whole "spanning filesystem location" chack would be unnecessary if we were doing DFS, so let me reiterate again that if we could switch this would be the best solution. > Or even simpler, it might just work with adding the root resource's > location, and eventually via traversal we will catch any loops that lead back > to the root resource. I think you are right, following my arguments above it is better to not add the project root to the spanning filesystem check. It means probably creating more duplicate resources, but also certainly suppressing less false positives. Thanks for your thorough review. It's good knowing that 4 eyes are watching this! I left out linked resources since I don't know enough how to deal with them. Your example above (and probably more) should definitely be added to the test cases. Also, I did not test what happens with the DeleteVisitor and CopyVisitor you mentioned in comment #33 since I did not know how to invoke them.
This bug doesn't apply to DeleteVisitor because DeleteVisitor deletes symlinks rather than following them; see bug 44106. However, it might still be good to check that any fix for this bug doesn't break DeleteVisitor.
(In reply to comment #47) > This bug doesn't apply to DeleteVisitor because DeleteVisitor deletes symlinks Interesting... how does it know it's a symlink and not just a folder?
(In reply to comment #48) > Interesting... how does it know it's a symlink and not just a folder? DeleteVisitor calls on the filesystem implementation (in the case of the local filesystem, LocalFile) to do the delete. LocalFile.internalDelete attempts a nonrecursive java.io.File.delete, which can delete anything except a real nonempty directory. internalDelete only recurses if that first attempt fails, so it avoids deleting through symlinks. There's a separate issue when local history is being kept; see bug 174492. (In reply to comment #46) > Also, I did not test what happens with the DeleteVisitor and > CopyVisitor you mentioned in comment #33 since I did not know how to invoke > them. You can invoke them simply by deleting and copying folders using the Navigator.
Ah, that's smart... thanks for the info! Does the DeleteVisitor then need to use the UnifiedTree after all? Or does it delete using the FileSystem implementation and then use the UnifiedTree to see whether any resources remain that could not be deleted?
(In reply to comment #46) > (In reply to comment #44) > > 1) Why are the parentPath and childPath both inserted into > > pathPrefixHistory? Won't we eventually find all loops if we only > > insert the childPath? Based on your suggestion of reducing the number of canonical path queries, I made some experiments with a modified algorithm that was more optimistic: - follow any symbolic links until getting to a place seen before - only check the "seen before" at junctions where links fork off It turned out that this algorithm worked well for the synthetic cases, but failed for the real world test4_udev test case. In the udev case, the link cycle is very long (up to 10 nodes), and while following the cycle the tree grows very fast in breadth because many children fork off. Because of that, detecting the link cycle just one level later leads to exponential growth in the number of nodes -- leading to 30 levels, 84000 nodes and 14000 canonical path queries in 25 seconds execution time on my PC. The current patch refreshes the same tree in < 2 seconds because it detects cycles more eagerly. What I conclude from this experiment, is that it is essential to detect the link cycle as early as possible, i.e. at the time the symbolic link forks off. This is exactly what the current patch does. It seems, that "eventually get to... again later" is just not good enough. I think a really good solution to the symbolic link refresh problem requires correctly modeling symbolic links as links in the resource tree. This does not only give better visual feedback to users, but also always suppress duplicate resources during traversal (since clients will be able to see those resources by following the link in the resource model). I'm not sure if explicit new API for symbolic links is required (as requested by bug 44107) or the existing "linked resource" notion could be used. Anyways, it looks like adding symbolic links to the resource model is out of scope for Eclipse 3.3, so I think the algorithm of the current patch is the best we can do for now. There is room for minor improvements: - remember the previous parent path so the canonical path for the parent is not computed again unnecessarily; - compute the parent canonical path ourselves by looking at the non-symlinked path segment between the current node and a previously computed canonical path for a parent higher up in the tree - defer computation of symlink children until all non-symlinks are computed But none of these would change the structure of the algorithm, or change its efficiency by orders of magnitude. > > 4) In initLinkHistoriesIfNeeded, it adds both the root resource's location, > > and the location of the root resource's project. I think this doesn't > > account for linked resources in the workspace. Please disregard my previous comment about more roots adding more false positives. I think this is not true. The more "spanning file system tree roots" are known upfront, the better - so adding tree roots brought in through linked resources would be a good thing provided that they are in the same EFS file system. But I still think that the positive effect of this would be very small, since the current patch computes the spanning tree root lazily when the first symbolic link points to it. So I'd recommend not adding linked resource roots for now, but keeping the project root (since this is the most common known spanning tree root). ONE LAST IMPORTANT NOTE: On second thought, I'm afraid that we cannot contribute the test4_udev files under the EPL, since this file system comes from an embedded linux product (most probably GPL but not sure). Please keep this test on bugzilla only for now and do not commit it as automated test case. All other tests are written by myself and contributed under the EPL.
Created attachment 65390 [details] Minor tweaks, code formatting, etc This is an entirely cosmetic change from Martin's patch 65205. Just attaching it here so I can test it out on other machines.
In my testing on a slow Linux box, I found that test4 doesn't complete. It just seems to refresh forever. This is because the job that performs background refresh (RefreshJob), will time-out after two seconds, and switch to only refreshing two levels of the tree at a time. See bug 74392 comment 22 for summary, and UnifiedTree.accept method with int argument. Since it only refreshes two levels at once, it doesn't catch recursive link structures that span more than two levels. It looks like solving this will require persisting the prefix history in RefreshJob until the refresh request queue is empty. I'm going to release the current patch anyway because this is no worse than the previous behaviour.
Patch released for 1pm EDT integration build. I'll leave this open for now to sort out remaining details.
(In reply to comment #53) > In my testing on a slow Linux box, I found that test4 doesn't complete. I guess the simplest fix for this would be to set suppressLinkedDuplicates = true. Perhaps this should be a UI preference after all. Code for a UI Preference is still attached to comment #21, I think this could be re-used. The other option would be if there were no Preference, but isRecursiveLink would always act as if suppressLinkedDuplicates were true; but the suppressed links would be put into a queue for later checking after all other resources from the queue are handled (at that time we can see more easily if they are recursive). Or, always act as if suppressLinkedDuplicates were true, but inside isRecursiveLink register the link pointing to a duplicate resource as a linked resouces. This would be the best solution from a user's perspective - but I don't understand the nature and lifecycle of linked resources enough to implement it...
> I guess the simplest fix for this would be to set suppressLinkedDuplicates = > true. Unless I'm missing something, I don't think suppressLinkedDuplicates is relevant here. The problem is that certain calls to UnifiedTree.accept only perform refresh of a fixed number of levels in the tree. If the cycle is not detected within that span, then it will go undetected because we throw away the path/prefix history across invocations of the accept method.
(In reply to comment #56) > I don't think suppressLinkedDuplicates is relevant here. It is relevant. Becuase if it is set, any symbolic link that points to a resource inside the same project is suppressed. So you need much less contextual information - knowing the "spanning file system roots" is sufficient, and these are set before the rest of the algorithm runs. Even when looking at a SINGLE level, all symbolic links that remin in the same project (which is all of them in the test4_udev case) will be suppressed. Only in the test cases that produce symbolic link loops by pointing to resource trees outside the project, it will take more than 1 level to detect the loop.
Excuse me if this is ignorant, but what if symbolic links were treated as just another form of persistence for linked Resources? Linked resources have the important property that they mask the underlying file system resource, so this would shield us from potentially recursive symbolic links. Much code deals with linked resources already in exactly the manner needed for symbolic links. Because they look the same for the user, it should be possible to keep the changes needed all inside the resources plugin: - a boolean bit with each linked resource indicating whether it has persistence in .project or is backed by the file system (as symlink) - Read&write of .project file to ignore the links backed by the file system - Different handling of delete for links backed by the file system - UnifiedTree to create/update links backed by the file system This would mean no API change, correct visual feedback for symbolic links, simple handling, no more problems with any recursive links ever, correct representation of the file system even if links introduce duplicates. Or am I missing any important obstacles?
Re comment #58: It's an interesting thought exercise, but linked resources are fairly different from sym-links. Linked resources are a pointer from the resource tree to the file system tree - it spans across two different "layers". Thus, there is no such thing as a cycle introduced by a linked resource - there is no link from the file system back up to the resource tree in order to incur a cycle. A given linked resource is only associated with one file system location. A resource corresponding to a symlink has two applicable resources - the location of the link, and the location of the target of the link. It's not clear how this would be treated in the API, for example, what the return value of IResource.getLocation() would be. Also, I don't see how the property of linked resources masking filesystem resources would help here. A linked resource represented as a symbolic link could not mask the symbolic link. If the symbolic link was modified to point to a different file, we would have to update the linked resource as well.
Marking fixed for M7. I have opened bug 185247 for adding JUnit tests to cover the scenarios described in comment #40.
Created attachment 65775 [details] Patch adding performance counters (development-time only) I'm still slightly concerned about the problems on a slow Linux box that you described in comment #53. Could these have been due to the unexpectedly missing symlink-enabled library for x86_64 (bug 184433 comment 19)? I tested with some performance counters (attached) and got the following results for the test4_complex_udev testcase: * #nodes in file system according to "find": 440 (13 levels) * #nodes found by RefreshLocalVisitor: 664 (14 levels) in 391 msec it's hard to imagine that your machine is 5 times slower than mine, going from 391 msec to 2 seconds thus hitting the timeout. FYI, regarding the previous discussion about suppressLinkedDuplicates - when suppressLinkedDuplicates is enabled, the numbers are * #nodes found with suppressLinkedDuplicates: 440 (13 levels) in 132 msec so in this case, not a single symbolic link is followed, giving exactly the number of nodes found by "find" and being 3 times faster; because getCanonicalPath() is called only 74 times instead of 184 times.
Created attachment 65778 [details] Patch adding a UI Preference for suppressLinkedDuplicates Attached patch adds a UI Preference for suppressing duplicate resources introduced by symbolic links in all cases, if anybody is interested. This is derived from John's patch in attachment 48580 [details], so I have not updated the Copyright headers.
I'm quite sure I had the correct library, because all of the other test cases worked fine. I'll try with this instrumented version and see what the times look like.
Verified the fix with I20070503-1400 on Linux x86_64 and Solaris 10 gtk. Created new bug 185509 for following up with additional improvements, for addressing the issue found in comment #53 on a slow Linux box. I did not carry over the (long) CC list to the new bug, so anybody interested in the follow-up please put yourself on CC of the new bug yourself.
The folk watching this bug may be interested to hear that another issue with cyclic symbolic links has been found in the "Import Existing Project" wizard. See bug #187318, put yourself on CC and vote for it if you are interested. A patch has been provided for that issue, let's see if it gets accepted.
*** Bug 215047 has been marked as a duplicate of this bug. ***
(In reply to Martin Oberhuber from comment #41) > Because this history is not stored after traversal is done, and because > traversal is done in a breadth-first-search manner, the fix has two > "strange" effects: > > 1.) In case of mutually recursive a symbolic links, refreshing on level 0 has > a different result than refreshing on lower levels - lower levels reveal > one level more of the recursive link loop, while refreshing on level 0 > reveals only the first one. This can be shown by refreshing on various > levels in project test1. > > 2.) There is one specific case where even non-recursive symbolic links show > fewer duplicate resources now, than it was before the fix. Look at test6 > from the attachment to comment #40 for details. > > Both these "strange" results appear in very rare scenarios, so I think these > are acceptable taking the benefit of the fix into account. Unfortunately we run into this rare scenario in bug 537449 :-( It is really hard to explain customer *without* recursive links in file system that Eclipse does not show random directories just because customer makes use of links. > Fixing them would > require a lot more effort, most probably including things like keeping track > of previous locations while following links and keeping this history in sync > with workspace changes; resource tree lookups during tree traversal; > deferring symbolic link evaluation in the BFS queue until all > non-symbolic-links are evaluated. We plan to do *something* in context of bug 537449, and especially we plan to finally add known test cases via bug 185247.