Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 293863 - [performance] Use FileSet with caution
Summary: [performance] Use FileSet with caution
Status: RESOLVED WONTFIX
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Dash Athena (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-31 22:51 EDT by David Carver CLA
Modified: 2012-01-30 11:32 EST (History)
2 users (show)

See Also:
nboldt: iplog+


Attachments
Moves fileset delete to after directory deletion. (1.20 KB, patch)
2009-11-02 09:08 EST, David Carver CLA
no flags Details | Diff
Tweaks for additonal performance (1.06 KB, patch)
2009-11-02 19:05 EST, David Carver CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Carver CLA 2009-10-31 22:51:22 EDT
It is very tempting to use the ANT fileset option in various tasks, but use of this option can affect the overall build performance, especially on large files.  If you have a list of directories that you want to delete, it is actually quicker to use multiple delete tasks that specify the directories specifically than to use a fileset with a delete and includes/excludes items.

As an example:

<delete>
   <exclude dir="${buildDir}" name="3rdPartyJars/*"/>
   <exclude dir="${buildDir}" name="downloads/*"/>
</delete>

Will be much slower than specifically saying what directory you want deleted.

<delete dir="${buildDir}/N200910312135"/>

The former has to check all the files and directories in the directory specified by ${buildDir} the later just deletes the specific directory.

If you know the names of the files or directories to be deleted always be specific.  There are some portions of Athena that could use some of this performance tuning.   On cleanup, with wst.xsl and wst.xsl-psychopath I have one plugin that has several hundred nested and with some of the nested fileset includes and exclusions for deletes it can take up to 20 minutes just to delete the directories.   With direct deletions of specific directories i.e. deleting all of buildDir directly, takes less than 10 seconds.
Comment 1 David Carver CLA 2009-11-02 09:08:58 EST
Created attachment 151068 [details]
Moves fileset delete to after directory deletion.

This moves the fileset delete to take place after the directory completions.  FileSet is getting all the files and includes instead of just gathering the includes.  It's a design "feature" of the ant fileset command, and on large builddir and directories it can be painlfully slow.  Solution is to delete the extraneous directories that we don't care about first, and then clean up the remaining files.
Comment 2 Nick Boldt CLA 2009-11-02 17:19:31 EST
Patch accepted, committed & rolled out to build.eclipse.
Comment 3 David Carver CLA 2009-11-02 19:04:12 EST
Reopening...there seems to be some additional tweaks that need to be done.  The following code in my custom CleanUp script runs in about a minute.

	<target name="cleanUp">
		<delete dir="${buildDir}/eclipse" failonerror="false" />
		<delete dir="${buildDir}/testing" failonerror="false" />
		<delete dir="${buildDir}/compilelogs" failonerror="false"/>
		<delete dir="${buildDir}/testResults/consolelogs" failonerror="false"/>
		<delete dir="${buildDir}/testResults/html" failonerror="false"/>
		<delete>
			<fileset dir="${writableBuildRoot}">
				<include name="*AllFeatures*.zip, *Master*.zip "/>
			</fileset>
		</delete>
	</target>

The performance difference I believe has to do with the includeEmpty dirs which is causing it to navigate through the tree directory structures checking for empty dirs.  Since we aren't using a fileset to select particular directories, we are just asking to delete the whole thing I think something like the following may help in buildAllHelper:

 <delete dir="${buildDir}/eclipse" failonerror="false"/>
 <delete dir="${buildDir}/testing" failonerror="false"/>
Comment 4 David Carver CLA 2009-11-02 19:05:23 EST
Created attachment 151141 [details]
Tweaks for additonal performance

additional tweaks for performance.
Comment 5 Nick Boldt CLA 2009-11-04 20:39:59 EST
(In reply to comment #4)
> Created an attachment (id=151141) [details]
> Tweaks for additonal performance
> 
> additional tweaks for performance.

followsymlinks="false" 
: required in case there's symlinks in there; otherwise the actual dir is deleted rather than the link.

includeemptydirs="true" 
: iirc, this was needed so that the result of the delete doesn't leave lots of empty dirs, but deletes them too

defaultexcludes="false"
: again, required to perform a complete cleanup, including .svn and CVS/ folders & files should they be present after a local source checkout & copy

Patch rejected.
Comment 6 David Carver CLA 2009-11-04 20:52:33 EST
(In reply to comment #5)
> (In reply to comment #4)
> > Created an attachment (id=151141) [details] [details]
> > Tweaks for additonal performance
> > 
> > additional tweaks for performance.
> 
> followsymlinks="false" 
> : required in case there's symlinks in there; otherwise the actual dir is
> deleted rather than the link.

I could see this as being a problem if Athena was creating symbolic links during the build.  However, the eclipse directory and testing directory do not appear to contain symbolics links in them.

> 
> includeemptydirs="true" 
> : iirc, this was needed so that the result of the delete doesn't leave lots of
> empty dirs, but deletes them too

Right...but if you are deleting the main directory i.e. eclipse, you aren't navigating and deleting the individual directories.  I've been using:

<delete dir="${buildDir}/eclipse"/>

Directly for a couple of weeks with out incident, it is the same as doing an rmdir or deleting the directory from the command prompt.  Will just wipe it out without going into it.,

> 
> defaultexcludes="false"
> : again, required to perform a complete cleanup, including .svn and CVS/
> folders & files should they be present after a local source checkout & copy

Not if you delete the entire directory.  If you only delete pieces, then yes, I agree these need to be there.

So if you get time, feel free to revisit, otherwise, I'll just use my own custom clean and remove that step from my athena builds until I hit a problem.
Comment 7 David Carver CLA 2009-11-07 22:37:54 EST
I did a bit more research on this in particular the use of followsymlinks, includeemptydirs, and defaultexcludes, here is what I found.

All of these should only appear on the delete task when there is a ResourceCollection (i.e. fileset) based delete being done.  As they provide additional information on what should be included in the deletion.  If you are doing a whole directory delete, they should not be included and will infact cause performance issues.   The reason being is that they automatically cause the system to have to recursively search all the directories and files to see if they match the rules.

There was some concern about symlinks, and honestly if you are making Athena able to be run on multiple platforms with out symlinks, we should be avoiding their use in the standard athena build for files created and delete by the build itself. So with this said, the directories that I still think can benefit greatly from the last patch I provided are the "eclipse" and "testing" working directories.  These are all created by the build themselves, and should not have any symlinks in them.   Even if they did, the original directories would not be deleted because symlinks by default is set to false so it won't follow them anyways.

It is a fact that the psychopath build time dropped by 20 minutes just by not using the default cleanup routine provided by athena, but using the custom routine.

Information for Delete and these particular tasks can be found:

Resource Collections - http://ant.apache.org/manual/CoreTypes/resources.html#collection

http://ant.apache.org/manual/CoreTasks/delete.html

includeemptydirs is only relevant for using filesets it has no effect on a straight directory delete but it's presence can cause the task to search all subdirectories anyway).

defaultexcludes is deprecated at the main task level see ResourceCollection on when and how it is to be used.

includesfile, excludesfile, includes, and excludes are deprecated attributes on the main delete task and only should appear with the use of ResourceCollections.

So...re-opening for further discussion based on the above information.
Comment 8 Wayne Beaton CLA 2011-12-22 23:14:53 EST
WONTFIX; Athena has been terminated.