This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 160389 - [api] change how offline task data is cached to disk and refactor attribute factory
Summary: [api] change how offline task data is cached to disk and refactor attribute f...
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P1 enhancement (vote)
Target Milestone: 2.0   Edit
Assignee: Robert Elves CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 171722
  Show dependency tree
 
Reported: 2006-10-10 17:46 EDT by Robert Elves CLA
Modified: 2007-06-18 20:49 EDT (History)
5 users (show)

See Also:


Attachments
naive performance test (5.36 KB, text/plain)
2007-04-22 15:08 EDT, Eugene Kuleshov CLA
no flags Details
Updated version of PersistenceTest with a few changes to improve db4o's performance. (4.56 KB, text/plain)
2007-04-29 19:33 EDT, Ismael Juma CLA
no flags Details
Offline refactoring take 1 (250.62 KB, text/plain)
2007-06-17 19:13 EDT, Robert Elves CLA
no flags Details
mylar/context/zip (191.53 KB, application/octet-stream)
2007-06-17 19:13 EDT, Robert Elves CLA
no flags Details
offline refactoring take 2 (225.65 KB, text/plain)
2007-06-18 13:49 EDT, Robert Elves CLA
no flags Details
Offline take 3 (231.72 KB, text/plain)
2007-06-18 14:11 EDT, Robert Elves CLA
no flags Details
Take 4 (231.90 KB, text/plain)
2007-06-18 14:48 EDT, Robert Elves CLA
no flags Details
Take 5 (231.94 KB, text/plain)
2007-06-18 15:41 EDT, Robert Elves CLA
no flags Details
Take 6 (272.87 KB, text/plain)
2007-06-18 19:21 EDT, Robert Elves CLA
no flags Details
Resolves merg conflicts (270.60 KB, text/plain)
2007-06-18 19:44 EDT, Robert Elves CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Elves CLA 2006-10-10 17:46:37 EDT
Currently offline task data is serialized to disk in a single file. This mechanism can easily break with changes to the participating classes and is slow when the cache file is large. Alternate methods need to be considered such as using xml and multiple files.
Comment 1 Robert Elves CLA 2006-10-17 14:47:42 EDT
Regarding removing repository configuration data from offline task data....what if we kept the old Repository configuration data? If both were being cached in the .mylar folder along with all the other data then we'd have this available to us (assuming it didn't get deleted along with repository deletion). Currently if you try to open a task who's repository has been deleted nothing happens (created bug#161297).  
Comment 2 Mik Kersten CLA 2006-10-17 22:16:55 EDT
That sounds like a good idea.  So the then the .mylar folder gets two new kinds of files, one for offline data, one for repository config?  Perhaps we should make a folder per-repository so that the structure looks something like this?
- .mylar/
    - repositories.xml.zip
    - repositories-data/
      - https%3A%2F%2Fbugs.eclipse.org-config.xml.zip
      - https%3A%2F%2Fbugs.eclipse.org-offline.data 
    
Comment 3 Robert Elves CLA 2006-10-18 12:56:50 EDT
As a first pass I can export task data as XML (via IMemento) to produce an -offline.data file which holds task data AND repository configuration options (as it is now serialized).  Then once we've determined how to externalize configuration data in a generic way (bug#150680) we can pull that data out into its own -config.xml.zip file (which will greatly simplify the task data xml).  Or do we try to do the complete separation all in one shot?
Comment 4 Mik Kersten CLA 2006-10-18 15:54:33 EDT
That does sound like a good baby step.  The cost is that if we do two releases that both change formats people lose their offline stuff twice.  So I think this is better to do this in one go.  If it's too involved to get finished by early tomorrow I suggest you postpone to 0.9.
Comment 5 Robert Elves CLA 2006-10-18 17:38:32 EDT
Okay, I'll push to 0.9 since (generic) repository configuration persistence may be a bit of a trick. I'll update as I proceed.
Comment 6 Robert Elves CLA 2006-11-14 12:35:53 EST
Overhaul of bugzilla http com took precedence, re-scheduling for 1.0.
Comment 7 Eugene Kuleshov CLA 2006-11-14 14:13:34 EST
Is this the issue related to alternative storage format? Perhaps based on Lucene...
Comment 8 Robert Elves CLA 2006-11-14 14:21:24 EST
Yes, Lucene is being considered... any more thoughts on this Eugene?
Comment 9 Eugene Kuleshov CLA 2006-11-14 16:00:15 EST
Regardless of using Lucene it maybe a good idea to break task data into multiple chunks (i.e. per category/query). With Lucene you can search trough them trough MultiReader and still able to update individual indexes.

Another question is do we want to keep feature for allowing tasks in the root of the task list or create some placeholder by default (i.e. "Default", "Local" or "My Tasks" category).
Comment 10 Robert Elves CLA 2006-11-15 17:45:05 EST
Pushing to post 1.0 as this will be too disruptive considering RC1 is in just over a week.
Comment 11 Steffen Pingel CLA 2006-11-15 22:17:07 EST
We should discuss what to store in the attributes at some point. I noticed that some of the Trac attributes such as status are currently displayed twice in the editor. These are now flagged as "hidden" to not display them in the attributes section but this only takes effect when the task is actually synchronized. 

I think these UI specific attributes should not be serialized but retrieved from the factory. This is related to bug 150680.
Comment 12 Robert Elves CLA 2006-11-15 22:24:33 EST
Yes, one of the major goals of this refactoring will be extraction of the ui related details from the task data such as visibility, operations and their descriptions etc... The attribute factory does seem like the right place for most of this.
Comment 13 Robert Elves CLA 2007-02-21 18:32:39 EST
Also need to consider revising attribute api to return type  getStringValue, getDateValue, etc
Comment 14 Robert Elves CLA 2007-02-21 22:26:48 EST
 (In reply to comment #13)
> Also need to consider revising attribute api to return type  getStringValue,
> getDateValue, etc
As per conversation on bug#170568
Comment 15 Steffen Pingel CLA 2007-03-14 15:05:11 EDT
Please also take a look at http://jackrabbit.apache.org/. It may have more than we need but it might be possible to use a subset of the functionality only.
Comment 16 Robert Elves CLA 2007-03-14 16:53:06 EDT
I agree. It may be overkill but worth looking into. I see that Corona is using it.
Comment 17 Mik Kersten CLA 2007-04-03 16:43:07 EDT
Here are the requirements that we outlined on today's call.
* Robustness to format change (e.g. can add attributes without breaking)
* Incremental read/write (e.g. separate files)
* In-memory cache, lazy access to non-cached data (e.g. 20MB max)
* Connectors only have to provide at most 2 mappings (repo -> RepositoryTaskData -> repo update)
* Figure out whether format should encode data types, not just Strings (probably not)
* External attributes should remain separate (i.e. attachments, could improve lifecycle)
Comment 18 Robert Elves CLA 2007-04-20 21:40:55 EDT
Another storage alternative: http://www.db4o.com/ (used by RssOwl see comments on bug#151997)
Comment 19 Eugene Kuleshov CLA 2007-04-22 15:08:51 EDT
Created attachment 64548 [details]
naive performance test

Here is a naive performance test comparing performance of db4o 6.1, Lucene 2.1 and Oracle Berkley for Java 3.2.23

It is adding/updating 1000 simple records and then run query by "group" field. Probably there is something wrong with the way I implemented code for db4o, but in my test it is about 10x slover then Lucene. There results are here:

bdb add/update: 0.97 sec
bdb query: need to implement searching by group field

lucene add/update: 2.63 sec
lucene query: 0.03 sec

db4o add/update: 20.72 sec
db4o query: 0.125 sec
Comment 20 Eugene Kuleshov CLA 2007-04-29 10:05:34 EDT
Ben, can you please look at my perf test (the db4o part)? Maybe you can suggest for to make it perform better?

Also, here are some notes about using Lucene 
http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf
Comment 21 Benjamin Pasero CLA 2007-04-29 17:20:30 EDT
(In reply to comment #20)
> Ben, can you please look at my perf test (the db4o part)? Maybe you can suggest
> for to make it perform better?
> 
> Also, here are some notes about using Lucene 
> http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf
> 

The scores for db4o look indeed weird. Maybe Ismael could comment as well. The first thing I tried was adding an index to the fields in Simple. This improved the add/update speed from 7 seconds to 1 second on my machine. Usually you would add all your object's ids to the index:

private static final class Simple implements Serializable {
  @Indexed
  String group;

  @Indexed
  String foo;

  @Indexed
  int n;

  public Simple(String group, String foo, int n) {
    this.group = group;
    this.foo = foo;
    this.n = n;
  }
}
Comment 22 Benjamin Pasero CLA 2007-04-29 17:33:55 EDT
Btw to be fair, I think you should configure Lucene to flush changes to the disk immediately. If the application crashes, the buffered documents in Lucene will be lost. I think by default, Lucene buffers up to 10 documents before flushing them.
Comment 23 Ismael Juma CLA 2007-04-29 17:49:55 EDT
The scores with the attached file for db4o in my machine were:

db4o: 3.505
db4o: 0.092

I changed it to use S.O.D.A queries, indexed the two fields that are used in the queries, set Db4o.configure().flushFileBuffers(false) and changed the query during the add/update to match the lucene one (just a single field) and I get the following:

db4o: 0.32
db4o: 0.045

If you need any additional information, feel free to ask.
Comment 24 Eugene Kuleshov CLA 2007-04-29 19:12:38 EDT
 (In reply to comment #23)
> I changed it to use S.O.D.A queries, indexed the two fields that are used in the
> queries, set Db4o.configure().flushFileBuffers(false) and changed the query
> during the add/update to match the lucene one (just a single field) and I get
> the following:
> db4o: 0.32
> db4o: 0.045
> If you need any additional information, feel free to ask.

Can you please attach your changes? Thanks
Comment 25 Ismael Juma CLA 2007-04-29 19:33:15 EDT
Created attachment 65355 [details]
Updated version of PersistenceTest with a few changes to improve db4o's performance.
Comment 26 Ismael Juma CLA 2007-04-29 19:34:51 EDT
I've attached the full file instead of a diff because I had reformatted it and the diff would be quite useless.

I've also added comments explaining the reason for the changes.
Comment 27 Robert Elves CLA 2007-05-08 02:50:53 EDT
Offline task data store needs to be move to .mylar folder and included in a regular backup schedule (sent to backup folder). 
Comment 28 Robert Elves CLA 2007-06-17 19:13:44 EDT
Created attachment 71577 [details]
Offline refactoring take 1

This patch includes two major changes:

1) Relocates the .mylar data folder to .metadat/.mylyn 
2) Includes new offline storage which writes all task data to zipped xml files within a folder named for the repository. The data is held in .metadata/.mylyn/storage. There are still plenty of improvements to be made including removal of the AttributeFactory but this will need to be done at a later date.
Comment 29 Robert Elves CLA 2007-06-17 19:13:51 EDT
Created attachment 71578 [details]
mylar/context/zip
Comment 30 Mik Kersten CLA 2007-06-17 20:42:31 EDT
Excellent!  

"storage" seems a bit weird to me because everything we have in that folder is storage.  How about calling that folder "offline"?
Comment 31 Steffen Pingel CLA 2007-06-17 21:16:25 EDT
I'll take a closer look at the patch tonight and apply it to my bootstrapped workspace tomorrow.
Comment 32 Steffen Pingel CLA 2007-06-18 01:54:14 EDT
From a first glance:
 
 * There are potential race conditions in OfflineCachingStorage, consider synchronizing all methods
 * What happens if a repository URL is changed? Is the task data deleted or migrated?
Comment 33 Robert Elves CLA 2007-06-18 02:20:59 EDT
 (In reply to comment #32)
> * There are potential race conditions in OfflineCachingStorage, consider
> synchronizing all methods
Yup, done.
> * What happens if a repository URL is changed? Is the task data deleted or
> migrated?
Data is currently manually migrated by TaskListManager. We could consider adding this functionality directly to the storage though.
Comment 34 Steffen Pingel CLA 2007-06-18 12:19:16 EDT
Rob, could you please provide a new patch against the current CVS head? I get merge conflicts with the current patch.
Comment 35 Robert Elves CLA 2007-06-18 13:49:59 EDT
Created attachment 71639 [details]
offline refactoring take 2

Concurrency fixes etc.
Comment 36 Steffen Pingel CLA 2007-06-18 14:00:22 EDT
Caught that during synchronization of my JIRA queries:

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.remove(HashMap.java:860)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.persistToStorage(OfflineCachingStorage.java:201)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.access$0(OfflineCachingStorage.java:195)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage$CacheFlushJob.run(OfflineCachingStorage.java:224)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 37 Robert Elves CLA 2007-06-18 14:11:42 EDT
Created attachment 71646 [details]
Offline take 3

Fixes ConcurrentModificationException and npe in context retrieval
Comment 38 Steffen Pingel CLA 2007-06-18 14:15:52 EDT
BTW, the exception above killed the CacheFlushJob which prevented Eclipse from shutting down due to CacheFlushJob.waitSaveCompleted() waiting for a notification that would never get send. That should be more robust.
Comment 39 Robert Elves CLA 2007-06-18 14:48:53 EDT
Created attachment 71653 [details]
Take 4

Good catch thanks Steffen.  Catching exception now in job so things don't blow up.
Comment 40 Steffen Pingel CLA 2007-06-18 14:56:17 EDT
java.lang.NullPointerException
	at org.eclipse.ui.XMLMemento$DOMWriter.getEscaped(XMLMemento.java:540)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:480)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476)
	at org.eclipse.ui.XMLMemento.save(XMLMemento.java:426)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineFileStorage.put(OfflineFileStorage.java:276)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.persistToStorage(OfflineCachingStorage.java:204)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.access$0(OfflineCachingStorage.java:197)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage$CacheFlushJob.run(OfflineCachingStorage.java:232)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 41 Robert Elves CLA 2007-06-18 15:41:56 EDT
Created attachment 71660 [details]
Take 5

Should resolve npe...
Comment 42 Steffen Pingel CLA 2007-06-18 17:04:09 EDT
Just found this exception in the error log after creating a new task:

java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:157)
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:730)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.retrieveFromCache(OfflineCachingStorage.java:92)
	at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.get(OfflineCachingStorage.java:78)
	at org.eclipse.mylyn.internal.tasks.core.TaskDataManager.retrieveState(TaskDataManager.java:312)
	at org.eclipse.mylyn.internal.tasks.core.TaskDataManager.getEdits(TaskDataManager.java:195)
	at org.eclipse.mylyn.tasks.ui.editors.AbstractRepositoryTaskEditorInput.refreshInput(AbstractRepositoryTaskEditorInput.java:121)
	at org.eclipse.mylyn.tasks.ui.editors.AbstractRepositoryTaskEditorInput.<init>(AbstractRepositoryTaskEditorInput.java:46)
	at org.eclipse.mylyn.tasks.ui.editors.RepositoryTaskEditorInput.<init>(RepositoryTaskEditorInput.java:30)
	at org.eclipse.mylyn.tasks.ui.editors.NewTaskEditorInput.<init>(NewTaskEditorInput.java:24)
	at org.eclipse.mylyn.internal.jira.ui.wizards.NewJiraTaskWizard.performFinish(NewJiraTaskWizard.java:87)
	at org.eclipse.jface.wizard.WizardDialog.finishPressed(WizardDialog.java:742)
	at org.eclipse.jface.wizard.WizardDialog.buttonPressed(WizardDialog.java:373)
	at org.eclipse.jface.dialogs.Dialog$2.widgetSelected(Dialog.java:616)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:227)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1101)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3319)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971)
	at org.eclipse.jface.window.Window.runEventLoop(Window.java:820)
	at org.eclipse.jface.window.Window.open(Window.java:796)
	at org.eclipse.mylyn.internal.tasks.ui.actions.NewTaskAction.run(NewTaskAction.java:62)
	at org.eclipse.mylyn.internal.tasks.ui.actions.NewTaskAction.run(NewTaskAction.java:70)
	at org.eclipse.ui.internal.PluginAction.runWithEvent(PluginAction.java:256)
	at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:545)
	at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:490)
	at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:402)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1101)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3319)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971)
	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2389)
	at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2353)
	at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2219)
	at org.eclipse.ui.internal.Workbench$4.run(Workbench.java:466)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:289)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:461)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
	at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:106)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:153)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:106)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:76)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:363)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:176)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:504)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:443)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1169)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1144)
Comment 43 Robert Elves CLA 2007-06-18 19:21:33 EDT
Created attachment 71670 [details]
Take 6

Fix for editor input issue.
Comment 44 Robert Elves CLA 2007-06-18 19:44:13 EDT
Created attachment 71673 [details]
Resolves merg conflicts
Comment 45 Robert Elves CLA 2007-06-18 20:05:04 EDT
Thanks for all your help testing this Steffen. Committed to head. 

AttributeContainer/Factory refactoring tracked on bug#193225
Comment 46 Mik Kersten CLA 2007-06-18 20:49:51 EDT
Wowwwww