Community
Participate
Working Groups
Currently offline task data is serialized to disk in a single file. This mechanism can easily break with changes to the participating classes and is slow when the cache file is large. Alternate methods need to be considered such as using xml and multiple files.
Regarding removing repository configuration data from offline task data....what if we kept the old Repository configuration data? If both were being cached in the .mylar folder along with all the other data then we'd have this available to us (assuming it didn't get deleted along with repository deletion). Currently if you try to open a task who's repository has been deleted nothing happens (created bug#161297).
That sounds like a good idea. So the then the .mylar folder gets two new kinds of files, one for offline data, one for repository config? Perhaps we should make a folder per-repository so that the structure looks something like this? - .mylar/ - repositories.xml.zip - repositories-data/ - https%3A%2F%2Fbugs.eclipse.org-config.xml.zip - https%3A%2F%2Fbugs.eclipse.org-offline.data
As a first pass I can export task data as XML (via IMemento) to produce an -offline.data file which holds task data AND repository configuration options (as it is now serialized). Then once we've determined how to externalize configuration data in a generic way (bug#150680) we can pull that data out into its own -config.xml.zip file (which will greatly simplify the task data xml). Or do we try to do the complete separation all in one shot?
That does sound like a good baby step. The cost is that if we do two releases that both change formats people lose their offline stuff twice. So I think this is better to do this in one go. If it's too involved to get finished by early tomorrow I suggest you postpone to 0.9.
Okay, I'll push to 0.9 since (generic) repository configuration persistence may be a bit of a trick. I'll update as I proceed.
Overhaul of bugzilla http com took precedence, re-scheduling for 1.0.
Is this the issue related to alternative storage format? Perhaps based on Lucene...
Yes, Lucene is being considered... any more thoughts on this Eugene?
Regardless of using Lucene it maybe a good idea to break task data into multiple chunks (i.e. per category/query). With Lucene you can search trough them trough MultiReader and still able to update individual indexes. Another question is do we want to keep feature for allowing tasks in the root of the task list or create some placeholder by default (i.e. "Default", "Local" or "My Tasks" category).
Pushing to post 1.0 as this will be too disruptive considering RC1 is in just over a week.
We should discuss what to store in the attributes at some point. I noticed that some of the Trac attributes such as status are currently displayed twice in the editor. These are now flagged as "hidden" to not display them in the attributes section but this only takes effect when the task is actually synchronized. I think these UI specific attributes should not be serialized but retrieved from the factory. This is related to bug 150680.
Yes, one of the major goals of this refactoring will be extraction of the ui related details from the task data such as visibility, operations and their descriptions etc... The attribute factory does seem like the right place for most of this.
Also need to consider revising attribute api to return type getStringValue, getDateValue, etc
(In reply to comment #13) > Also need to consider revising attribute api to return type getStringValue, > getDateValue, etc As per conversation on bug#170568
Please also take a look at http://jackrabbit.apache.org/. It may have more than we need but it might be possible to use a subset of the functionality only.
I agree. It may be overkill but worth looking into. I see that Corona is using it.
Here are the requirements that we outlined on today's call. * Robustness to format change (e.g. can add attributes without breaking) * Incremental read/write (e.g. separate files) * In-memory cache, lazy access to non-cached data (e.g. 20MB max) * Connectors only have to provide at most 2 mappings (repo -> RepositoryTaskData -> repo update) * Figure out whether format should encode data types, not just Strings (probably not) * External attributes should remain separate (i.e. attachments, could improve lifecycle)
Another storage alternative: http://www.db4o.com/ (used by RssOwl see comments on bug#151997)
Created attachment 64548 [details] naive performance test Here is a naive performance test comparing performance of db4o 6.1, Lucene 2.1 and Oracle Berkley for Java 3.2.23 It is adding/updating 1000 simple records and then run query by "group" field. Probably there is something wrong with the way I implemented code for db4o, but in my test it is about 10x slover then Lucene. There results are here: bdb add/update: 0.97 sec bdb query: need to implement searching by group field lucene add/update: 2.63 sec lucene query: 0.03 sec db4o add/update: 20.72 sec db4o query: 0.125 sec
Ben, can you please look at my perf test (the db4o part)? Maybe you can suggest for to make it perform better? Also, here are some notes about using Lucene http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf
(In reply to comment #20) > Ben, can you please look at my perf test (the db4o part)? Maybe you can suggest > for to make it perform better? > > Also, here are some notes about using Lucene > http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf > The scores for db4o look indeed weird. Maybe Ismael could comment as well. The first thing I tried was adding an index to the fields in Simple. This improved the add/update speed from 7 seconds to 1 second on my machine. Usually you would add all your object's ids to the index: private static final class Simple implements Serializable { @Indexed String group; @Indexed String foo; @Indexed int n; public Simple(String group, String foo, int n) { this.group = group; this.foo = foo; this.n = n; } }
Btw to be fair, I think you should configure Lucene to flush changes to the disk immediately. If the application crashes, the buffered documents in Lucene will be lost. I think by default, Lucene buffers up to 10 documents before flushing them.
The scores with the attached file for db4o in my machine were: db4o: 3.505 db4o: 0.092 I changed it to use S.O.D.A queries, indexed the two fields that are used in the queries, set Db4o.configure().flushFileBuffers(false) and changed the query during the add/update to match the lucene one (just a single field) and I get the following: db4o: 0.32 db4o: 0.045 If you need any additional information, feel free to ask.
(In reply to comment #23) > I changed it to use S.O.D.A queries, indexed the two fields that are used in the > queries, set Db4o.configure().flushFileBuffers(false) and changed the query > during the add/update to match the lucene one (just a single field) and I get > the following: > db4o: 0.32 > db4o: 0.045 > If you need any additional information, feel free to ask. Can you please attach your changes? Thanks
Created attachment 65355 [details] Updated version of PersistenceTest with a few changes to improve db4o's performance.
I've attached the full file instead of a diff because I had reformatted it and the diff would be quite useless. I've also added comments explaining the reason for the changes.
Offline task data store needs to be move to .mylar folder and included in a regular backup schedule (sent to backup folder).
Created attachment 71577 [details] Offline refactoring take 1 This patch includes two major changes: 1) Relocates the .mylar data folder to .metadat/.mylyn 2) Includes new offline storage which writes all task data to zipped xml files within a folder named for the repository. The data is held in .metadata/.mylyn/storage. There are still plenty of improvements to be made including removal of the AttributeFactory but this will need to be done at a later date.
Created attachment 71578 [details] mylar/context/zip
Excellent! "storage" seems a bit weird to me because everything we have in that folder is storage. How about calling that folder "offline"?
I'll take a closer look at the patch tonight and apply it to my bootstrapped workspace tomorrow.
From a first glance: * There are potential race conditions in OfflineCachingStorage, consider synchronizing all methods * What happens if a repository URL is changed? Is the task data deleted or migrated?
(In reply to comment #32) > * There are potential race conditions in OfflineCachingStorage, consider > synchronizing all methods Yup, done. > * What happens if a repository URL is changed? Is the task data deleted or > migrated? Data is currently manually migrated by TaskListManager. We could consider adding this functionality directly to the storage though.
Rob, could you please provide a new patch against the current CVS head? I get merge conflicts with the current patch.
Created attachment 71639 [details] offline refactoring take 2 Concurrency fixes etc.
Caught that during synchronization of my JIRA queries: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.remove(HashMap.java:860) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.persistToStorage(OfflineCachingStorage.java:201) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.access$0(OfflineCachingStorage.java:195) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage$CacheFlushJob.run(OfflineCachingStorage.java:224) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Created attachment 71646 [details] Offline take 3 Fixes ConcurrentModificationException and npe in context retrieval
BTW, the exception above killed the CacheFlushJob which prevented Eclipse from shutting down due to CacheFlushJob.waitSaveCompleted() waiting for a notification that would never get send. That should be more robust.
Created attachment 71653 [details] Take 4 Good catch thanks Steffen. Catching exception now in job so things don't blow up.
java.lang.NullPointerException at org.eclipse.ui.XMLMemento$DOMWriter.getEscaped(XMLMemento.java:540) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:480) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento$DOMWriter.print(XMLMemento.java:476) at org.eclipse.ui.XMLMemento.save(XMLMemento.java:426) at org.eclipse.mylyn.internal.tasks.ui.OfflineFileStorage.put(OfflineFileStorage.java:276) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.persistToStorage(OfflineCachingStorage.java:204) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.access$0(OfflineCachingStorage.java:197) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage$CacheFlushJob.run(OfflineCachingStorage.java:232) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Created attachment 71660 [details] Take 5 Should resolve npe...
Just found this exception in the error log after creating a new task: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:157) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:730) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.retrieveFromCache(OfflineCachingStorage.java:92) at org.eclipse.mylyn.internal.tasks.ui.OfflineCachingStorage.get(OfflineCachingStorage.java:78) at org.eclipse.mylyn.internal.tasks.core.TaskDataManager.retrieveState(TaskDataManager.java:312) at org.eclipse.mylyn.internal.tasks.core.TaskDataManager.getEdits(TaskDataManager.java:195) at org.eclipse.mylyn.tasks.ui.editors.AbstractRepositoryTaskEditorInput.refreshInput(AbstractRepositoryTaskEditorInput.java:121) at org.eclipse.mylyn.tasks.ui.editors.AbstractRepositoryTaskEditorInput.<init>(AbstractRepositoryTaskEditorInput.java:46) at org.eclipse.mylyn.tasks.ui.editors.RepositoryTaskEditorInput.<init>(RepositoryTaskEditorInput.java:30) at org.eclipse.mylyn.tasks.ui.editors.NewTaskEditorInput.<init>(NewTaskEditorInput.java:24) at org.eclipse.mylyn.internal.jira.ui.wizards.NewJiraTaskWizard.performFinish(NewJiraTaskWizard.java:87) at org.eclipse.jface.wizard.WizardDialog.finishPressed(WizardDialog.java:742) at org.eclipse.jface.wizard.WizardDialog.buttonPressed(WizardDialog.java:373) at org.eclipse.jface.dialogs.Dialog$2.widgetSelected(Dialog.java:616) at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:227) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1101) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3319) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971) at org.eclipse.jface.window.Window.runEventLoop(Window.java:820) at org.eclipse.jface.window.Window.open(Window.java:796) at org.eclipse.mylyn.internal.tasks.ui.actions.NewTaskAction.run(NewTaskAction.java:62) at org.eclipse.mylyn.internal.tasks.ui.actions.NewTaskAction.run(NewTaskAction.java:70) at org.eclipse.ui.internal.PluginAction.runWithEvent(PluginAction.java:256) at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:545) at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:490) at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:402) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:66) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1101) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3319) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971) at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2389) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2353) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2219) at org.eclipse.ui.internal.Workbench$4.run(Workbench.java:466) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:289) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:461) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:106) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:153) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:106) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:76) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:363) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:504) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:443) at org.eclipse.equinox.launcher.Main.run(Main.java:1169) at org.eclipse.equinox.launcher.Main.main(Main.java:1144)
Created attachment 71670 [details] Take 6 Fix for editor input issue.
Created attachment 71673 [details] Resolves merg conflicts
Thanks for all your help testing this Steffen. Committed to head. AttributeContainer/Factory refactoring tracked on bug#193225
Wowwwww