| Summary: | [Import/Export] File -> Import filesystem w/cyclical symbolic folder links causes OOME | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | John Camelon <john.camelon> | ||||||||||||
| Component: | IDE | Assignee: | Prakash Rangaraj <prakash> | ||||||||||||
| Status: | VERIFIED FIXED | QA Contact: | Prakash Rangaraj <prakash> | ||||||||||||
| Severity: | normal | ||||||||||||||
| Priority: | P3 | CC: | amanji, bokowski, celek, daniel_megert, Dmitry_Karasik, john.arthorne, krzysztof.daniel, makandre, pwebster, stevenyj, tomasz.zarna | ||||||||||||
| Version: | 3.4.2 | ||||||||||||||
| Target Milestone: | 3.5.1 | ||||||||||||||
| Hardware: | PC | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Bug Depends on: | |||||||||||||||
| Bug Blocks: | 253941, 293159, 296456 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
John Camelon
John, if it makes any difference, what kind of Import did the customer try? Was it Archive File, Existing Projects into Workspace or File System? It is the Import -> File System that the customer was trying. They were trying to import 46,000+ files from RHEL 5.3 64bit. The option "Create complete folder structure" was used during the import. This looks like it is caused by the model of the local file system created by the import wizard (MinimizedFileSystemElement, etc). At this point it doesn't look like it has even started the import yet - it's just gathering the recursive list of files to import. Please also see bug 216901. Prakash, is this something we can look at in 3.5.1? PW (In reply to comment #5) > Prakash, is this something we can look at in 3.5.1? Will give a try. The root cause is because the import wizard tries to create an instance of File for every file/folder that is about to be imported. If you have 55k files, that would translate to at least the same amount of File instances - each of them stores the path in a string. This essentially leads to a whole lot of String and File instances and eventually eats up the whole memory. A cursory glance on the problem doesn't give me a simple solution that can be implemented and tested for 3.5.1. Moving the target. Probably will look into 3.5.2 Was this introduced in 3.4? I do not seem to recall a problem when our product was based upon Eclipse 3.3. Why would you need to keep all those files in memory? Other than working outside of Eclipse and refreshing, can anyone provide a suggestion as to clean, easy to follow ways to get larger (non Eclipse) source code bases into 3.4 based products? (In reply to comment #8) > Was this introduced in 3.4? I do not seem to recall a problem when our product > was based upon Eclipse 3.3. Not sure of when was this introduced. > Other than working outside of Eclipse and refreshing, can anyone provide a > suggestion as to clean, easy to follow ways to get larger (non Eclipse) source > code bases into 3.4 based products? Drag the folder from Windows Explorer and drop it on the project/folder in Project Explorer helps? There are some simple things that can be done on FileSystemElement to reduce its size:
1) Initialize FileSystemElement.workbenchAdapter lazily, so it is never created if getAdapter is never called. Then override getAdapter on MinimizedFileSystemElement to return "this", and make MinimizedFileSystemElement implement IWorkbenchAdapter directly. This saves an extra object per instance.
2) Don't initialize "files" and "folders" to a new empty list in the getFiles() and getFolders() methods. Have a static singleton representing an empty list and return that instead. This saves six objects per file (which never have children) (two AdaptableList, two ArrayList, two arrays). In fact getFiles/getFolders could look like this:
private static final AdaptableList EMPTY_LIST = new AdaptableList(0);
public AdaptableList getFiles() {
if (!isDirectory || files == null)
return EMPTY_LIST;
return files;
}
3) Possibly store files and folders as FileSystemElement[] rather than an AdaptableList. This would be a speed/space trade-off as the list would need to be created when getFiles/getFolders is called.
These are all relatively minor but would offer some improvement that adds up in very large cases.
I have been trying to reproduce the exact scenario (albeit on a different linux distribution). One thing about this customer's environment is that the file tree had an abundance of symbolic links ... some valid and contained within the file tree , others pointing to directories and files that did not exist. There were approximately 1000 symbolic links in the tree of 50K items. While I have not been to reproduce this yet on my system, I did see that the memory burn rate (via the Heap trim) spiked and dropped continually the more soft links that I involved in my test case. Is it possible the symbolic link structure contains cycles? We handle that in the resource model but perhaps the import wizard does not account for that. That is exactly what is happening. I created a project w/2 cyclical folder hierarchies using symbolic links, and I hit OOM within 10 seconds of hitting Finish on the wizard. I have updated the summary. See bug 187318 for a fix for the same problem in a different import wizard. Prakash, can you work on a fix based on the new information? Please also provide the fix based upon 3.4.2. I am starting the IES process in parallel. Created attachment 144799 [details]
Patch v01
If a symbolic link is found to be cyclic, it will ignore it
(In reply to comment #17) > Created an attachment (id=144799) [details] > Patch v01 > > If a symbolic link is found to be cyclic, it will ignore it Paul, I've tested it in Mac. Can you check this works fine in Linux? Note that File.getCanonicalPath() is an expensive call because it needs to interact with the file system. This patch calls getCanonicalPath() on every parent of each resource below a link, which is O(n^2) in the worst case. Also, I don't think it will handle cases like this:
+ A
+ B
+ link1 --> /A/C
+ C
+ link2 --> /A/B
Traversing this will also loop, but each link is not a prefix of its parent. The only effective solution I have seen is to remember the canonical paths that have been seen, and skip files with paths that have been visited already.
See org.eclipse.core.internal.localstore.PrefixPool and org.eclipse.core.internal.localstore.UnifiedTree.isRecursiveLink(IFileStore, IFileInfo) for an example of how this is implemented in o.e.core.resources. PW Created attachment 145000 [details] After patch (In reply to comment #19) > The only effective solution I have seen is to remember the canonical paths that > have been seen, and skip files with paths that have been visited already. Remembering the visited files means storing it (or its path) somewhere. Since the original problem is OOM, I'm trying not to create any new objects. > I don't think it will handle cases like this: > > + A > + B > + link1 --> /A/C > + C > + link2 --> /A/B Nope. It does works well in this scenario. See the attached image Created attachment 145001 [details]
Before applying the patch
(In reply to comment #21) > > Remembering the visited files means storing it (or its path) somewhere. > Since the original problem is OOM, I'm trying not to create any new objects. But the premise of this bug is it is an OOM problem because cyclic directory links are causing a directory of "infinite size" to be read in. A hash of 100000 strings couldn't cause an OOM problem (at least not unless you had other problems :-) PW Created attachment 145458 [details]
Patch v02
Patch using PrefixPool
To consider for 3.5.1 PW Boris, could I get a +1 for 3.5.1? PW +1 (In reply to comment #24) > Created an attachment (id=145458) [details] > Patch v02 Released for M20090826-0800 also tested with the scenarios from bug 105554 comment #40 PW Created attachment 145674 [details]
3.4.x version of patch v02
If necessary to patch back to 3.4.x
PW
Verified in M20090826-1100 |