Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 285804

Summary: [Import/Export] File -> Import filesystem w/cyclical symbolic folder links causes OOME
Product: [Eclipse Project] Platform Reporter: John Camelon <john.camelon>
Component: IDEAssignee: Prakash Rangaraj <prakash>
Status: VERIFIED FIXED QA Contact: Prakash Rangaraj <prakash>
Severity: normal    
Priority: P3 CC: amanji, bokowski, celek, daniel_megert, Dmitry_Karasik, john.arthorne, krzysztof.daniel, makandre, pwebster, stevenyj, tomasz.zarna
Version: 3.4.2   
Target Milestone: 3.5.1   
Hardware: PC   
OS: Linux   
Whiteboard:
Bug Depends on:    
Bug Blocks: 253941, 293159, 296456    
Attachments:
Description Flags
Patch v01
none
After patch
none
Before applying the patch
none
Patch v02
none
3.4.x version of patch v02 none

Description John Camelon CLA 2009-08-05 20:29:10 EDT
Build ID: M20090211-1700

Steps To Reproduce:
With a their source code base (55K files spread across the directory structure), the customer tried to import the File tree into a new (and empty) eclipse workspace.  

The OOM gets hit even with -Xmx1024M. 

Closing Eclipse and copying the files directly into the workspace root and restarting Eclipse works, so the issue seems isolated to the memory complexity of the Resources UI dialog. 

We encourage this workflow in Jazz SCM, is there something else we should be doing?

java.lang.OutOfMemoryError
	at java.lang.StringBuilder.ensureCapacityImpl(StringBuilder.java:342)
	at java.lang.StringBuilder.append(StringBuilder.java:132)
	at java.io.UnixFileSystem.resolve(UnixFileSystem.java:123)
	at java.io.File.<init>(File.java:313)
	at org.eclipse.ui.wizards.datatransfer.FileSystemStructureProvider.getChildren(FileSystemStructureProvider.java:50)
	at org.eclipse.ui.internal.wizards.datatransfer.MinimizedFileSystemElement.populate(MinimizedFileSystemElement.java:78)
	at org.eclipse.ui.internal.wizards.datatransfer.MinimizedFileSystemElement.getFiles(MinimizedFileSystemElement.java:46)
	at org.eclipse.ui.internal.wizards.datatransfer.WizardFileSystemResourceImportPage1$7.getChildren(WizardFileSystemResourceImportPage1.java:480)
	at org.eclipse.ui.model.BaseWorkbenchContentProvider.getElements(BaseWorkbenchContentProvider.java:73)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:410)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGrou.pfindAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
	at org.eclipse.ui.internal.ide.dialogs.ResourceTreeAndListGroup.findAllSelectedListElements(ResourceTreeAndListGroup.java:423)
Comment 1 Tomasz Zarna CLA 2009-08-06 05:54:01 EDT
John, if it makes any difference, what kind of Import did the customer try? Was it Archive File, Existing Projects into Workspace or File System?
Comment 2 Steven Jin CLA 2009-08-06 09:12:50 EDT
It is the Import -> File System that the customer was trying.

They were trying to import 46,000+ files from RHEL 5.3 64bit. The option "Create complete folder structure" was used during the import.
Comment 3 John Arthorne CLA 2009-08-06 13:27:39 EDT
This looks like it is caused by the model of the local file system created by the import wizard (MinimizedFileSystemElement, etc). At this point it doesn't look like it has even started the import yet - it's just gathering the recursive list of files to import.
Comment 4 Dani Megert CLA 2009-08-10 04:43:38 EDT
Please also see bug 216901.
Comment 5 Paul Webster CLA 2009-08-10 11:51:30 EDT
Prakash, is this something we can look at in 3.5.1?

PW
Comment 6 Prakash Rangaraj CLA 2009-08-10 13:04:21 EDT
(In reply to comment #5)
> Prakash, is this something we can look at in 3.5.1?


   Will give a try.
Comment 7 Prakash Rangaraj CLA 2009-08-11 04:53:10 EDT
The root cause is because the import wizard tries to create an instance of File for every file/folder that is about to be imported. If you have 55k files, that would translate to at least the same amount of File instances - each of them stores the path in a string. This essentially leads to a whole lot of String and File instances and eventually eats up the whole memory. 

A cursory glance on the problem doesn't give me a simple solution that can be implemented and tested for 3.5.1. Moving the target. Probably will look into 3.5.2
Comment 8 John Camelon CLA 2009-08-11 07:53:10 EDT
Was this introduced in 3.4? I do not seem to recall a problem when our product was based upon Eclipse 3.3.  

Why would you need to keep all those files in memory?

Other than working outside of Eclipse and refreshing, can anyone provide a suggestion as to clean, easy to follow ways to get larger (non Eclipse) source code bases into 3.4 based products? 
Comment 9 Prakash Rangaraj CLA 2009-08-11 08:17:22 EDT
(In reply to comment #8)
> Was this introduced in 3.4? I do not seem to recall a problem when our product
> was based upon Eclipse 3.3.  

   Not sure of when was this introduced.

> Other than working outside of Eclipse and refreshing, can anyone provide a
> suggestion as to clean, easy to follow ways to get larger (non Eclipse) source
> code bases into 3.4 based products? 

   Drag the folder from Windows Explorer and drop it on the project/folder in Project Explorer helps?

    

Comment 10 John Arthorne CLA 2009-08-13 12:23:00 EDT
There are some simple things that can be done on FileSystemElement to reduce its size:

1) Initialize FileSystemElement.workbenchAdapter lazily, so it is never created if getAdapter is never called. Then override getAdapter on MinimizedFileSystemElement to return "this", and make MinimizedFileSystemElement implement IWorkbenchAdapter directly. This saves an extra object per instance.

2) Don't initialize "files" and "folders" to a new empty list in the getFiles() and getFolders() methods. Have a static singleton representing an empty list and return that instead. This saves six objects per file (which never have children) (two AdaptableList, two ArrayList, two arrays). In fact getFiles/getFolders could look like this:

  private static final AdaptableList EMPTY_LIST = new AdaptableList(0);

  public AdaptableList getFiles() {
    if (!isDirectory || files == null)
      return EMPTY_LIST;
    return files;
  }

3) Possibly store files and folders as FileSystemElement[] rather than an AdaptableList. This would be a speed/space trade-off as the list would need to be created when getFiles/getFolders is called.

These are all relatively minor but would offer some improvement that adds up in very large cases.
Comment 11 John Camelon CLA 2009-08-13 14:48:04 EDT
I have been trying to reproduce the exact scenario (albeit on a different linux distribution).  One thing about this customer's environment is that the file tree had an abundance of symbolic links ... some valid and contained within the file tree , others pointing to directories and files that did not exist.  There were approximately 1000 symbolic links in the tree of 50K items.  

While I have not been to reproduce this yet on my system, I did see that the memory burn rate (via the Heap trim) spiked and dropped continually the more soft links that I involved in my test case.  
Comment 12 John Arthorne CLA 2009-08-13 15:10:20 EDT
Is it possible the symbolic link structure contains cycles? We handle that in the resource model but perhaps the import wizard does not account for that.
Comment 13 John Camelon CLA 2009-08-14 08:07:48 EDT
That is exactly what is happening.  I created a project w/2 cyclical folder hierarchies using symbolic links, and I hit OOM within 10 seconds of hitting Finish on the wizard.  I have updated the summary.  
Comment 14 John Arthorne CLA 2009-08-14 10:57:52 EDT
See bug 187318 for a fix for the same problem in a different import wizard.
Comment 15 Boris Bokowski CLA 2009-08-17 06:12:09 EDT
Prakash, can you work on a fix based on the new information?
Comment 16 John Camelon CLA 2009-08-17 08:39:09 EDT
Please also provide the fix based upon 3.4.2.  I am starting the IES process in parallel. 
Comment 17 Prakash Rangaraj CLA 2009-08-18 05:13:15 EDT
Created attachment 144799 [details]
Patch v01

If a symbolic link is found to be cyclic, it will ignore it
Comment 18 Prakash Rangaraj CLA 2009-08-18 05:16:33 EDT
(In reply to comment #17)
> Created an attachment (id=144799) [details]
> Patch v01
> 
> If a symbolic link is found to be cyclic, it will ignore it


Paul,

    I've tested it in Mac. Can you check this works fine in Linux?
Comment 19 John Arthorne CLA 2009-08-19 10:03:07 EDT
Note that File.getCanonicalPath() is an expensive call because it needs to interact with the file system. This patch calls getCanonicalPath() on every parent of each resource below a link, which is O(n^2) in the worst case. Also, I don't think it will handle cases like this:

+ A
  + B
    + link1 --> /A/C
  + C
    + link2 --> /A/B

Traversing this will also loop, but each link is not a prefix of its parent. The only effective solution I have seen is to remember the canonical paths that have been seen, and skip files with paths that have been visited already.
Comment 20 Paul Webster CLA 2009-08-19 12:16:55 EDT
See org.eclipse.core.internal.localstore.PrefixPool and org.eclipse.core.internal.localstore.UnifiedTree.isRecursiveLink(IFileStore, IFileInfo) for an example of how this is implemented in o.e.core.resources.

PW
Comment 21 Prakash Rangaraj CLA 2009-08-19 13:13:53 EDT
Created attachment 145000 [details]
After patch 

(In reply to comment #19)

> The only effective solution I have seen is to remember the canonical paths that
> have been seen, and skip files with paths that have been visited already.

    Remembering the visited files means storing it (or its path) somewhere. Since the original problem is OOM, I'm trying not to create any new objects.

> I don't think it will handle cases like this:
> 
> + A
>   + B
>     + link1 --> /A/C
>   + C
>     + link2 --> /A/B

    Nope. It does works well in this scenario. See the attached image
Comment 22 Prakash Rangaraj CLA 2009-08-19 13:14:29 EDT
Created attachment 145001 [details]
Before applying the patch
Comment 23 Paul Webster CLA 2009-08-19 13:43:49 EDT
(In reply to comment #21)
> 
>     Remembering the visited files means storing it (or its path) somewhere.
> Since the original problem is OOM, I'm trying not to create any new objects.

But the premise of this bug is it is an OOM problem because cyclic directory links are causing a directory of "infinite size" to be read in.  A hash of 100000 strings couldn't cause an OOM problem (at least not unless you had other problems :-)

PW
Comment 24 Prakash Rangaraj CLA 2009-08-24 13:36:26 EDT
Created attachment 145458 [details]
Patch v02

Patch using PrefixPool
Comment 25 Paul Webster CLA 2009-08-24 14:33:22 EDT
To consider for 3.5.1
PW
Comment 26 Paul Webster CLA 2009-08-25 10:38:36 EDT
Boris, could I get a +1 for 3.5.1?

PW
Comment 27 Boris Bokowski CLA 2009-08-25 10:54:08 EDT
+1
Comment 28 Paul Webster CLA 2009-08-25 14:58:55 EDT
(In reply to comment #24)
> Created an attachment (id=145458) [details]
> Patch v02

Released for M20090826-0800

also tested with the scenarios from bug 105554 comment #40

PW
Comment 29 Paul Webster CLA 2009-08-26 09:48:02 EDT
Created attachment 145674 [details]
3.4.x version of patch v02

If necessary to patch back to 3.4.x
PW
Comment 30 Prakash Rangaraj CLA 2009-08-28 00:56:47 EDT
Verified in M20090826-1100