Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 167402 - Error: "An internal error occurred during: "Task List Saver"." in org.eclipse.core.jobs
Summary: Error: "An internal error occurred during: "Task List Saver"." in org.eclipse...
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P2 minor (vote)
Target Milestone: 1.0.1   Edit
Assignee: Mik Kersten CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 167308 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-12-11 05:43 EST by Andreas Deimer CLA
Modified: 2006-12-22 16:00 EST (History)
3 users (show)

See Also:


Attachments
prototype patch for concurrency issues (4.30 KB, patch)
2006-12-13 18:19 EST, Eugene Kuleshov CLA
no flags Details | Diff
mylar/context/zip (6.45 KB, application/octet-stream)
2006-12-13 18:20 EST, Eugene Kuleshov CLA
no flags Details
Proposal patch using synchronized (16.84 KB, patch)
2006-12-17 22:28 EST, Willian Mitsuda CLA
no flags Details | Diff
mylar/context/zip (11.30 KB, application/octet-stream)
2006-12-17 22:29 EST, Willian Mitsuda CLA
no flags Details
mylar/context/zip (22.03 KB, application/octet-stream)
2006-12-20 19:14 EST, Mik Kersten CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Deimer CLA 2006-12-11 05:43:37 EST
I somewhat clicked my way around in the task list, then this error happened.
Connected
to a Jira repository.
Shouldnt be that hard to find...


I report this through the Mylar
interface, the version 1.0.0 does not appear in the Attributes/Version list.


Mylar Version
1.0.0 
Jira Connector 1.0.0

BRgds
Andreas 

-- Error Log --
Date: Mon Dec 11 10:32:13
CET 2006
Message: An internal error occurred during: "Task List Saver".
Severity: Error
Plugin
ID: org.eclipse.core.jobs
Stack Trace:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
at
java.util.HashMap$KeyIterator.next(HashMap.java:823)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at
org.eclipse.mylar.internal.tasks.ui.util.TaskListWriter.writeTaskList(TaskListWriter.java:120)
at
org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager.internalSaveTaskList(TaskListSaveManager.java:131)
at
org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager.access$0(TaskListSaveManager.java:129)
at
org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager$TaskListSaverJob.run(TaskListSaveManager.java:306)
at
org.eclipse.core.internal.jobs.Worker.run(Worker.java:58)
Comment 1 Mik Kersten CLA 2006-12-11 10:54:13 EST
Yes, on rare occasion we can reproduce this in our test suite.  There should be no bad behavior, since the next Task List save will succeed and they happen on every modification.  If there was any bad behavior please let us know.
Comment 2 Willian Mitsuda CLA 2006-12-11 13:53:27 EST
Related to bug#165745?
Comment 3 Mik Kersten CLA 2006-12-11 16:14:50 EST
Yes, partly.  What happened is that Eugnen's enhancement to make the Task List saves asynchronous (bug 166607) can sometimes trigger thread safety problems with Task List and the way that we externalize it (in this  case the list is modified while it is being written).  Previously we relied on the lack of synchronous modify and write, and I applied that patch too late in the 1.0 end game for us to recover entirely from this change (it introduced other race conditions that we had to recover from).  So for 1.0 we have the benefit of faster saves, but introduced this concurrency problem.  As I said we were aware of it before releasing and it won't cause any bad behavior.  The worst part about it right now is that it will cause the test suite to fail every so often, so I'm scheduling it for 1.0.1.
Comment 4 Eugene Kuleshov CLA 2006-12-11 16:24:08 EST
(In reply to comment #3)
Mik, in a retrospective, do you think it was a bad idea to apply my patch?

By the way, we should probably always return a copy of any children collection instead of wrapping that collection on the callee side...
Comment 5 Mik Kersten CLA 2006-12-11 16:41:56 EST
> (In reply to comment #3)
> Mik, in a retrospective, do you think it was a bad idea to apply my patch?

It's a real borderline case, it probably was a mistake, but not because of what your patch did (other than the fact that you should have run AllTests to catch some of the errors sooner).  The mistake was for me to apply it know that we probably didn't have enough time to recover fully from implementation bugs that your patch could have uncovered, since performance optimizations of this sort tend to be uncover bugs of that sort.  That said, other than the problem of the test suite regression, the net effect of us having applied your patch is a positive one for Mylar users since this bug is totally minor, and the reduction in Task List writing will be noticeable on slow machines or with very large Task Lists.

> By the way, we should probably always return a copy of any children collection
> instead of wrapping that collection on the callee side...

Yes, as part of doing some fixes bug 165745 I made all relevant collections return unmodifiable lists, so we'll actually get the exception that we want to see.  But the bug here is access to a map while it is changing.
Comment 6 Eugene Kuleshov CLA 2006-12-11 17:03:10 EST
(In reply to comment #5)
> That said, other than the problem of the
> test suite regression, the net effect of us having applied your patch is a
> positive one for Mylar users since this bug is totally minor, and the reduction
> in Task List writing will be noticeable on slow machines or with very large
> Task Lists.

Actually patch was addressing issues with group operations. Those operations were incredibly slow even on fast machines, because time for those operations was linear function from the number of affected tasks, with fixed time on writing task list. So, schedule, delete, mark read, etc executed on 100 tasks were required 100 * time need to write task list (1..3 seconds for me).

> > By the way, we should probably always return a copy of any children collection
> > instead of wrapping that collection on the callee side...
> 
> Yes, as part of doing some fixes bug 165745 I made all relevant collections
> return unmodifiable lists, so we'll actually get the exception that we want to
> see.  But the bug here is access to a map while it is changing.

NO. Unmodifiable lists does not help. We should return an explicit copy instead. I was also thinking about using CopyOnWriteArrayList, but it has stupid limitation that defeat the benefits of using it.
Comment 7 Robert Elves CLA 2006-12-12 14:32:36 EST
Just had this same error happen in my workspace. No adverse effects to report yet...but we need to get this resolved.
Comment 8 Willian Mitsuda CLA 2006-12-12 21:41:30 EST
 (In reply to comment #6)
> NO. Unmodifiable lists does not help. We should return an explicit copy instead.
> I was also thinking about using CopyOnWriteArrayList, but it has stupid
> limitation that defeat the benefits of using it.

CopyOnWriteArrayList must be used only on situations where modifications are rare.

Just return an explicit copy won't fix the problem too, because a concurrent operation can happen while copying the internal collection.

We need to protected all modifying operations + collection getters from TaskList either using synchronize or using lightweight Locks (this is the preferred/optimized way).
Comment 9 Willian Mitsuda CLA 2006-12-12 21:47:01 EST
BTW, Mik, what was implemented for bug#165745? I just checked TaskList class and there is not concurrency handling.
Comment 10 Eugene Kuleshov CLA 2006-12-12 22:52:32 EST
(In reply to comment #8)
> CopyOnWriteArrayList must be used only on situations where modifications are
> rare.

Right. But do we know how often write is happening comparing to iteration?

The limitation I was referring to is that copy on write collection could take into account that copy should only happens after iterator instance had been requested. If there was no new iterator instance since last write most likely there is no need to copy list.

> Just return an explicit copy won't fix the problem too, because a concurrent
> operation can happen while copying the internal collection.

Sure, the copying method need to be synchronized.

> We need to protected all modifying operations + collection getters from
> TaskList either using synchronize or using lightweight Locks (this is the
> preferred/optimized way).

Hmm. What do you call lightweight locks? Isn't hotspot optimizing synchonized blocks already?
Comment 11 Willian Mitsuda CLA 2006-12-12 23:15:20 EST
(In reply to comment #10)
> Right. But do we know how often write is happening comparing to iteration?

Humm... Every time a task or query is added to tasklist?

> Hmm. What do you call lightweight locks? Isn't hotspot optimizing synchonized
> blocks already?
> 

Yes, Sun is making a lot of optimizations on synchronized blocks on each release of hotspot.

The lightweight lock I was referring to is the Lock interface and its ReentrantLock implementation, which is semantically equivalent to a synchronized block, but it is more flexible and efficient, however it adds more complexity.
Comment 12 Eugene Kuleshov CLA 2006-12-12 23:58:07 EST
(In reply to comment #11)
> > Right. But do we know how often write is happening comparing to iteration?
> Humm... Every time a task or query is added to tasklist?

So, that isn't really as often as iteration which is currently happens on any change of attributes of these tasks.

> The lightweight lock I was referring to is the Lock interface and its
> ReentrantLock implementation, which is semantically equivalent to a
> synchronized block, but it is more flexible and efficient, however it adds more
> complexity.

My impression was that these locks only needed if you need to do lock/unlock between method calls. Otherwise there should be more space for optimization when locking is isolated within one method...
Comment 13 Willian Mitsuda CLA 2006-12-13 09:31:44 EST
(In reply to comment #12)
> So, that isn't really as often as iteration which is currently happens on any
> change of attributes of these tasks.

I think we need to examine each case. E.g., copying the entire task list on each add operation in a huge tasklist would be very bad.

However, for query/category lists it would perform well.

But all of this introduce a new degree of complexity. Perhaps just synchronize would be fine to us.

> My impression was that these locks only needed if you need to do lock/unlock
> between method calls. Otherwise there should be more space for optimization
> when locking is isolated within one method...
> 

Yes it is only a matter of optimization. There is a trade-off between complexity and optimization.
Comment 14 Mik Kersten CLA 2006-12-13 12:31:57 EST
Willian, Eugene: I haven't been able to take a close look yet, have you converged on a proposal for handling this concurrency?
Comment 15 Eugene Kuleshov CLA 2006-12-13 18:19:53 EST
Created attachment 55620 [details]
prototype patch for concurrency issues

I think that current anteraction nature is mostly iteration. So, we should use CopyOnWriteArrayList/CopyOnWriteArraySet and ConcurrentHashMap for all collections.

This patch replaces collections directly involved into task list data structures, but I could of missed something. Also, I haven't wrapped collections returned from these classes into the unmodifiable wrappers...
Comment 16 Eugene Kuleshov CLA 2006-12-13 18:20:02 EST
Created attachment 55621 [details]
mylar/context/zip
Comment 17 Eugene Kuleshov CLA 2006-12-14 17:10:08 EST
Mik, please note that I haven't really tested this patch. It is more to outline the idea...
Comment 18 Willian Mitsuda CLA 2006-12-17 22:23:44 EST
-1 about the CopyOnWrite-collections and simply use the ConcurrentHashMap.

1 - I think CopyOnWrite collections would be an overkill for this situation. It will turn each add operation expensive. If I'm not wrong, CopyOnWrite are only for situations where changes are rare and read operations many times more frequent, so the price of not using any synchronization/concurrent collection is very worth. On this case we don't have data to justify it.

2 - About the ConcurrentCollections, I'm afraid to use them, because they can mask some concurrency bug, i.e., there are methods that touches the internal task list AND the internal category list. I wonder if there can be some situation when another process touch one of the structures. On this case, I think the better is to synchronize the whole block (or method).

But all of this are just speculation. I hadn't analyzed all possible scenarios.
Comment 19 Willian Mitsuda CLA 2006-12-17 22:28:58 EST
Created attachment 55841 [details]
Proposal patch using synchronized

Like Eugene said, this is only to outline my idea too :)

I'm not sure if it is secure to apply this type of change for 1.0.1.
Comment 20 Willian Mitsuda CLA 2006-12-17 22:29:02 EST
Created attachment 55842 [details]
mylar/context/zip
Comment 21 Eugene Kuleshov CLA 2006-12-17 22:37:12 EST
(In reply to comment #18)
> -1 about the CopyOnWrite-collections and simply use the ConcurrentHashMap.

There is no concurrent hash map for ordered collections, so copy on write is the only option besides synchronization. Note that you'll have to synchronize/block on entire iteration cycle trough collection elements. And that is much worse then create the copy.

> 1 - I think CopyOnWrite collections would be an overkill for this situation. 

If I recall correctly there are some copy-on-read happening already. So, my proposal should only make things better. :-)

> It will turn each add operation expensive. If I'm not wrong, CopyOnWrite are only
> for situations where changes are rare and read operations many times more
> frequent, so the price of not using any synchronization/concurrent collection
> is very worth. On this case we don't have data to justify it.

That is the thing. Tasks and hits aren't added that often. So, we do have case for copy on write.

> 2 - About the ConcurrentCollections, I'm afraid to use them, because they can
> mask some concurrency bug, i.e., there are methods that touches the internal
> task list AND the internal category list. I wonder if there can be some
> situation when another process touch one of the structures. On this case, I
> think the better is to synchronize the whole block (or method).

It won't be any worse. The idea is that concurrent collections give you a slice of time. So, if things are added afterwards, they will be all processed at the next iteration.
Comment 22 Willian Mitsuda CLA 2006-12-17 22:59:10 EST
(In reply to comment #21)
> If I recall correctly there are some copy-on-read happening already. So, my
> proposal should only make things better. :-)

*sigh* :)

I think the CopyOnRead case can (must?) be optimized in future by analyzing each case; e.g., move the method that is iterating into the tasklist and synchronized it...

> That is the thing. Tasks and hits aren't added that often. So, we do have case
> for copy on write.

Humm... no? :-)

http://www.jroller.com/page/eu?entry=mylar_has_turned_one

I presume there are at least 400+ (based on your estimative of 1 bug/day) bugs on your tasklist. Every time you add one, or open some new query hit, it will create a new arraylist and copy all elements into it.

It is not a scalable solution... like I said, the CopyOnWrite are for rare changes.

> It won't be any worse. The idea is that concurrent collections give you a slice
> of time. So, if things are added afterwards, they will be all processed at the
> next iteration.
> 

Like I said, I hadn't analyzed all use cases, so that is why I'm afraid on simply turning the internal collections on concurrent collections.
Comment 23 Eugene Kuleshov CLA 2006-12-17 23:07:24 EST
(In reply to comment #22)
> I think the CopyOnRead case can (must?) be optimized in future by analyzing
> each case; e.g., move the method that is iterating into the tasklist and
> synchronized it...

I doubt you can move any iteration block into the task list, unless you are willing to use command/closure pattern, which is less natural to code around.

> > That is the thing. Tasks and hits aren't added that often. So, we do have case
> > for copy on write.
> I presume there are at least 400+ (based on your estimative of 1 bug/day) bugs
> on your tasklist. Every time you add one, or open some new query hit, it will
> create a new arraylist and copy all elements into it.

You know what they say about statistics? Believe it or not, but 1 task a day added is nothing comparing to hundreds of times task list is being iterated trough.

> It is not a scalable solution... like I said, the CopyOnWrite are for rare
> changes.

Aren't those rare enough? Do you want it to be added not more often then once a year or what?
Comment 24 Mik Kersten CLA 2006-12-18 22:02:00 EST
*** Bug 167308 has been marked as a duplicate of this bug. ***
Comment 25 Mik Kersten CLA 2006-12-19 18:19:53 EST
Willian, Eugene: thanks for the analysis and dialog.  For 1.0.1 I have tried to pick the least disruptive path, and so what I have done is made the four stores of task list data be ConcurrentHashMap's.  All were either sets or maps so this was a straightforward change.  I believe that for this case the concurrent collections actually give us the expected behavior that Eugene describes: when you iterate over the TaskList, you get the collection that corresponds to the "slice of time" that you asked for it in (i.e. additions made while you are iterating will be ignored).

The only testing that I have done of this is to run the AllTasksTests suite 10x in a row.  Previously a few runs would reliably cause a concurrency problem, and I can no longer reproduce.

Eugene: on today's conference call you mentioned looking for places that made copies of task list contents, and that this should no longer be necessary.  Were those places in your patch, or did you mean that we should look through the existing accessors of some of the methods?
Comment 26 Eugene Kuleshov CLA 2006-12-19 18:39:10 EST
(In reply to comment #25)
> The only testing that I have done of this is to run the AllTasksTests suite 10x
> in a row.  Previously a few runs would reliably cause a concurrency problem,
> and I can no longer reproduce.

Cool stuff!

> Eugene: on today's conference call you mentioned looking for places that made
> copies of task list contents, and that this should no longer be necessary. 
> Were those places in your patch, or did you mean that we should look through
> the existing accessors of some of the methods?

No. Those wasn't included into my patch. For example in TaskListWriter has : new ArrayList<ITask>(taskList.getAllTasks())

You may also want to look at the type of AbstractRepositoryQuery.hitHandles field and other collections in the ITaskListElement object hierarchy

Also, there is no need to wrap value set returned from concurrent hash map like you did in TaskList.getQueryHits() and few other methods. That set will have same semantics as the originating map.
Comment 27 Mik Kersten CLA 2006-12-20 19:14:11 EST
Thanks for all your input on this guys, it really helped.  Ironic how the solution was so simple in the end, but I'm really glad that thanks to your analysis we now have a good understanding of the options and tradeoffs.  We should still keep an eye out for any concurrency problems since we do not currently have a direct test for this.

Eugene, regarding those wrapped value sets, yes, they don't do anything, but I had to temporarily add them because ConcurrentHashMap.values() returns a Collection, and our current API says we return a Set.  I've added TODOs to remove that once we can change the API post 1.0.1.

    // TODO: remove wrapping once API can change
    return Collections.unmodifiableSet(new HashSet<AbstractRepositoryQuery>(queries.values()));
    
I searched for all access of these 4 maps, and the only non-test class I found still copying was TaskListWriter, and I removed that copy.
Comment 28 Mik Kersten CLA 2006-12-20 19:14:15 EST
Created attachment 56013 [details]
mylar/context/zip
Comment 29 Eugene Kuleshov CLA 2006-12-20 21:43:00 EST
Mik, how about AbstractRepositoryQuery and AbstractTaskContainer classes? Maybe there is few others in this hierarchy. Those also may have same concurrency issues for their children...
Comment 30 Mik Kersten CLA 2006-12-21 13:04:19 EST
Yes, that had crossed my mind, but we can't do it in a reasonable way without changing that API to return Collection<ITask> instead of a set.  We should have always had it return a Collection going by Bloch's API guidelines of using the most top-level interface you can, because that would have made this change possible.  For post 1.0.1 I have created bug 168878.

I just saw a very odd, albeit minor, problem: I deactivate a task via Ctrl+Shift+F9 and it did deactivate, but the Task List still showed it as active.  I thought I had seen something like this a week or two ago.  Will investigate, and if anyone alse sees something like this please comment.
Comment 31 Mik Kersten CLA 2006-12-21 13:17:39 EST
(In reply to comment #30)
> I just saw a very odd, albeit minor, problem: I deactivate a task via
> Ctrl+Shift+F9 and it did deactivate, but the Task List still showed it as
> active.  I thought I had seen something like this a week or two ago.  Will
> investigate, and if anyone alse sees something like this please comment.

I think I found the problem: we're using a List for all active tasks, and clients were using that bogus idiom of making copies of the list to avoid concurrency.  What I have done instead is made TaskList.activeTasks a CopyOnWriteArrayList as per Eugene's suggestion.  With the current UI this list only has 0 or 1 items in it, s othere is no performance issue.
Comment 32 Eugene Kuleshov CLA 2006-12-21 14:31:11 EST
 (In reply to comment #30)
> Yes, that had crossed my mind, but we can't do it in a reasonable way without
> changing that API to return Collection<ITask> instead of a set.  We should have
> always had it return a Collection going by Bloch's API guidelines of using the
> most top-level interface you can, because that would have made this change
> possible.  For post 1.0.1 I have created bug 168878.

Yes, but for sets you can actually use ConcurrentHashMap with single bogus instance as a value. Then map.keySet() will give you instance of the concurrent set.

Same applies to TaskList.tasks, TaskList.categories and TaskList.queries collections. Collection TaskList.queryHits seem to actually need get by key operation...

Also note this bogus copy in TaskList class:

	public void refactorRepositoryUrl(Object oldUrl, String newUrl) {
		for (ITask task : new ArrayList<ITask>(tasks.values())) {

and this strange code in getQueryHits() method (it should not call queryHits.get(handle) twice):

			AbstractQueryHit hit = queryHits.get(handle);
			if(hit != null) {
				result.add(queryHits.get(handle));
			}

Comment 33 Mik Kersten CLA 2006-12-22 15:32:43 EST
Fixed both cases.
Comment 34 Eugene Kuleshov CLA 2006-12-22 15:35:01 EST
(In reply to comment #33)
> Fixed both cases.

What about ConcurrentHashMap fields?
Comment 35 Mik Kersten CLA 2006-12-22 15:40:54 EST
We can do that post 1.0.1 as part of bug 168878, please comment there if you haven't yet.
Comment 36 Eugene Kuleshov CLA 2006-12-22 15:42:41 EST
(In reply to comment #35)
> We can do that post 1.0.1 as part of bug 168878, please comment there if you
> haven't yet.

I did. But change I suggesting does not require API changes.

Comment 37 Mik Kersten CLA 2006-12-22 16:00:31 EST
I know, but I need to put attention on other bugs and am not sure if I have time to do that for 1.0.1.