Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 334930

Summary: Provide a way to refresh that finds new children without doing attribute checks on other resources
Product: [Eclipse Project] Platform Reporter: Chris Recoskie <recoskie>
Component: ResourcesAssignee: Platform-Resources-Inbox <platform-resources-inbox>
Status: CLOSED WONTFIX QA Contact:
Severity: enhancement    
Priority: P3 CC: angvoz.dev, jamesblackburn+eclipse, malaperle, overholt, pwebster, Szymon.Brandys, yevshif
Version: 3.7   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard: stalebug
Bug Depends on:    
Bug Blocks: 278257    
Attachments:
Description Flags
work in progress none

Description Chris Recoskie CLA 2011-01-20 12:49:20 EST
For the full motivation behind this, see Bug 133881 and the design document (work in progress) attached to it.

If we wish to selectively refresh some resources and not others, we are stuck right now because the platform doesn't provide a way to discover new children without refreshing all children of the resource that is the root of the refresh.

Ultimately, refreshing of resources results in some code somewhere calling org.eclipse.core.resources.IResource.refreshLocal(int, IProgressMonitor).  The first parameter is an integer flag indicating the depth to which the resource tree should be refreshed.  The possible values are:

•	DEPTH_ZERO – refresh only the resource itself; do not refresh any children.
•	DEPTH_ONE – refresh the resource and its immediate children, but not the subtrees of those children.  Any new children of the resource that is at the root of the refresh will be discovered.
•	DEPTH_INFINITE – refresh the resource, all immediate children, and recursively all subtrees of those children, discovering new resources where appropriate.

So, let us imagine that we have a C++ project, and we have specified somehow that we wish to never refresh *.cpp files when we do a build.  Let us further imagine that our source files and the object files generated from them are intermingled in the same directories – probably the most common pattern of makefile.  As we visit each folder in our project, we have the option of the three different depths of refreshing.  However, none of the options follow our model.

•	DEPTH_ZERO will not discover any new resources at all.
•	DEPTH_ONE will discover new resources (newly generated .o files), but will refresh all the children of the folder (both the *.cpp files and the *.o files).  Hence, this results in refreshing the entire project (albeit on a folder by folder basis), and hence it is completely at odds with our exclusion semantics.  We end up refreshing all kinds of *.cpp files that we didn’t want to;  in effect we refresh roughly twice as many files as we should.
•	DEPTH_INFINITE – similar to DEPTH_ONE, in that new children are discovered, but all children of the folder will be refreshed

As you can see, the platform’s refresh model is at odds with what we want to do.  We cannot selectively refresh any files unless they are already in the resource tree, in which case we could refresh them with DEPTH_ZERO.  The only way we can discover new resources is to effectively refresh everything.  We can selectively only refresh certain folders that already exist in the tree, which would be a start compared to the current behaviour of refreshing absolutely everything, but this scenario doesn’t handle the most common use case, which is that developers tend to intermingle their source files and object files.  Thus, we cannot fully serve all the requirements without getting some changes into the Eclipse Platform.

One way of accomplishing this would be for the Platform to provide a new way of refreshing, that discovers new children but does not refresh any previously existing children.  Thus new children could be discovered via this new method, and then once the set of children is known, selective refreshes could be performed on individual resources using DEPTH_ZERO.

This could be accomplished in one of several ways.  One possiblity would be to add another depth flag to refreshLocal(...), such as DEPTH_CHILDREN or somesuch, that only looked for new children but didn't do attribute checks on any existing resources.  Another possible solution would be to create a new API altogether, such as refreshNewChildrenOnly(...) that had similar behaviour.
Comment 1 James Blackburn CLA 2011-01-20 13:07:14 EST
I guess the main assumption here is that there are many fewer listFiles than  stats. 
Existing directories would need to be listed to discover new resources, and newly discovered resources would need to be queries to discover whether or not they're files.   If files greatly outnumber directories (which I guess is very likely) this would be an order of magnitude improvement.  

Taking one of my slow refresh projects (I'm on NFS):
bash:jamesb:xl-cbga-20:32892> find . -type f |wc
  50412   51240 4979196
bash:jamesb:xl-cbga-20:32893> find . -type d |wc
   2863    2887  177557

So I we might get a 20x time saving.  What do your projects look like Chris?

The other point to note is that we really don't want the Workspace locked while the refresh is happening - or at least it shouldn't be locked for long periods.
Comment 2 Andrew Gvozdev CLA 2011-01-20 13:35:33 EST
> DEPTH_ZERO – refresh only the resource itself; do not refresh any children.
Shouldn't DEPTH_ZERO applied to a folder discover (but not refresh) any children? FS commonly implements a folder as a special kind of file which content is the list of files in the folder. If you refresh that you should possess the list of files already.
Comment 3 Chris Recoskie CLA 2011-02-25 16:35:58 EST
Created attachment 189862 [details]
work in progress

Attaching work in progress for discussion purposes.

The new type of refresh is now ever so slightly faster than a regular refresh, but not really fast enough.  On a remote project hosted with RSE that has 250 folders each with 100 C++ files (for a total of 25,000 source files) that gets compiled into another 25,000 additional object files, doing a DEPTH_INFINITE refresh on the project is about one second slower than updating the folder contents with the new DEPTH_FOLDER_CONTENTS flag.

The problem is that doing individual calls to IFileStore.fetchInfo(...) is really slow;  if you refresh that way, just doing fetching info for newly discovered children, you actually end up on the whole refreshing slower than if you just did a full refresh.  The existing refresh functionality avoids this by fetching all child infos in one call via IFileStore.childInfos(...).  Doing it that way in the new method got things down to one second faster, but essentially defeats the purpose of what we're trying to do, because for one second's difference you might as well just do a full refresh at DEPTH_ONE since you are fetching info for all the children anyway.

In order to make this worthwhile, we'd need a way to ask an IFileStore to get the child infos for a specific list of children in one operation.  That way we could retrieve the list of filesystem children, reconcile them against the workspace members, and then only fetchInfo() for any newly discovered resources.
Comment 4 Szymon Brandys CLA 2011-03-17 12:29:06 EDT
(In reply to comment #3)
I talked to James B. today about the patch. It seems that we all are busy at the end of M7. Since James has been working in the refresh area recently, I asked him to look at the patch to speed the work up. His comment will be valuable. I will find time next week to look and comment.
Comment 5 Chris Recoskie CLA 2011-03-23 11:23:49 EDT
(In reply to comment #4)
> (In reply to comment #3)
> I talked to James B. today about the patch. It seems that we all are busy at
> the end of M7. Since James has been working in the refresh area recently, I
> asked him to look at the patch to speed the work up. His comment will be
> valuable. I will find time next week to look and comment.

Well basically the feature doesn't seem to be worth doing without adding some API to EFS to go along with it, and in order to validate the performance, I'd have to implement the API for some EFS provider (RSE would make most sense).  I have had other things to work on so I was not planning to look at this again until Juno timeframe.
Comment 6 Lars Vogel CLA 2019-11-14 03:17:48 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

If the bug is still relevant, please remove the "stalebug" whiteboard tag.