Community
Participate
Working Groups
I somewhat clicked my way around in the task list, then this error happened. Connected to a Jira repository. Shouldnt be that hard to find... I report this through the Mylar interface, the version 1.0.0 does not appear in the Attributes/Version list. Mylar Version 1.0.0 Jira Connector 1.0.0 BRgds Andreas -- Error Log -- Date: Mon Dec 11 10:32:13 CET 2006 Message: An internal error occurred during: "Task List Saver". Severity: Error Plugin ID: org.eclipse.core.jobs Stack Trace: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787) at java.util.HashMap$KeyIterator.next(HashMap.java:823) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.eclipse.mylar.internal.tasks.ui.util.TaskListWriter.writeTaskList(TaskListWriter.java:120) at org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager.internalSaveTaskList(TaskListSaveManager.java:131) at org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager.access$0(TaskListSaveManager.java:129) at org.eclipse.mylar.internal.tasks.ui.util.TaskListSaveManager$TaskListSaverJob.run(TaskListSaveManager.java:306) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:58)
Yes, on rare occasion we can reproduce this in our test suite. There should be no bad behavior, since the next Task List save will succeed and they happen on every modification. If there was any bad behavior please let us know.
Related to bug#165745?
Yes, partly. What happened is that Eugnen's enhancement to make the Task List saves asynchronous (bug 166607) can sometimes trigger thread safety problems with Task List and the way that we externalize it (in this case the list is modified while it is being written). Previously we relied on the lack of synchronous modify and write, and I applied that patch too late in the 1.0 end game for us to recover entirely from this change (it introduced other race conditions that we had to recover from). So for 1.0 we have the benefit of faster saves, but introduced this concurrency problem. As I said we were aware of it before releasing and it won't cause any bad behavior. The worst part about it right now is that it will cause the test suite to fail every so often, so I'm scheduling it for 1.0.1.
(In reply to comment #3) Mik, in a retrospective, do you think it was a bad idea to apply my patch? By the way, we should probably always return a copy of any children collection instead of wrapping that collection on the callee side...
> (In reply to comment #3) > Mik, in a retrospective, do you think it was a bad idea to apply my patch? It's a real borderline case, it probably was a mistake, but not because of what your patch did (other than the fact that you should have run AllTests to catch some of the errors sooner). The mistake was for me to apply it know that we probably didn't have enough time to recover fully from implementation bugs that your patch could have uncovered, since performance optimizations of this sort tend to be uncover bugs of that sort. That said, other than the problem of the test suite regression, the net effect of us having applied your patch is a positive one for Mylar users since this bug is totally minor, and the reduction in Task List writing will be noticeable on slow machines or with very large Task Lists. > By the way, we should probably always return a copy of any children collection > instead of wrapping that collection on the callee side... Yes, as part of doing some fixes bug 165745 I made all relevant collections return unmodifiable lists, so we'll actually get the exception that we want to see. But the bug here is access to a map while it is changing.
(In reply to comment #5) > That said, other than the problem of the > test suite regression, the net effect of us having applied your patch is a > positive one for Mylar users since this bug is totally minor, and the reduction > in Task List writing will be noticeable on slow machines or with very large > Task Lists. Actually patch was addressing issues with group operations. Those operations were incredibly slow even on fast machines, because time for those operations was linear function from the number of affected tasks, with fixed time on writing task list. So, schedule, delete, mark read, etc executed on 100 tasks were required 100 * time need to write task list (1..3 seconds for me). > > By the way, we should probably always return a copy of any children collection > > instead of wrapping that collection on the callee side... > > Yes, as part of doing some fixes bug 165745 I made all relevant collections > return unmodifiable lists, so we'll actually get the exception that we want to > see. But the bug here is access to a map while it is changing. NO. Unmodifiable lists does not help. We should return an explicit copy instead. I was also thinking about using CopyOnWriteArrayList, but it has stupid limitation that defeat the benefits of using it.
Just had this same error happen in my workspace. No adverse effects to report yet...but we need to get this resolved.
(In reply to comment #6) > NO. Unmodifiable lists does not help. We should return an explicit copy instead. > I was also thinking about using CopyOnWriteArrayList, but it has stupid > limitation that defeat the benefits of using it. CopyOnWriteArrayList must be used only on situations where modifications are rare. Just return an explicit copy won't fix the problem too, because a concurrent operation can happen while copying the internal collection. We need to protected all modifying operations + collection getters from TaskList either using synchronize or using lightweight Locks (this is the preferred/optimized way).
BTW, Mik, what was implemented for bug#165745? I just checked TaskList class and there is not concurrency handling.
(In reply to comment #8) > CopyOnWriteArrayList must be used only on situations where modifications are > rare. Right. But do we know how often write is happening comparing to iteration? The limitation I was referring to is that copy on write collection could take into account that copy should only happens after iterator instance had been requested. If there was no new iterator instance since last write most likely there is no need to copy list. > Just return an explicit copy won't fix the problem too, because a concurrent > operation can happen while copying the internal collection. Sure, the copying method need to be synchronized. > We need to protected all modifying operations + collection getters from > TaskList either using synchronize or using lightweight Locks (this is the > preferred/optimized way). Hmm. What do you call lightweight locks? Isn't hotspot optimizing synchonized blocks already?
(In reply to comment #10) > Right. But do we know how often write is happening comparing to iteration? Humm... Every time a task or query is added to tasklist? > Hmm. What do you call lightweight locks? Isn't hotspot optimizing synchonized > blocks already? > Yes, Sun is making a lot of optimizations on synchronized blocks on each release of hotspot. The lightweight lock I was referring to is the Lock interface and its ReentrantLock implementation, which is semantically equivalent to a synchronized block, but it is more flexible and efficient, however it adds more complexity.
(In reply to comment #11) > > Right. But do we know how often write is happening comparing to iteration? > Humm... Every time a task or query is added to tasklist? So, that isn't really as often as iteration which is currently happens on any change of attributes of these tasks. > The lightweight lock I was referring to is the Lock interface and its > ReentrantLock implementation, which is semantically equivalent to a > synchronized block, but it is more flexible and efficient, however it adds more > complexity. My impression was that these locks only needed if you need to do lock/unlock between method calls. Otherwise there should be more space for optimization when locking is isolated within one method...
(In reply to comment #12) > So, that isn't really as often as iteration which is currently happens on any > change of attributes of these tasks. I think we need to examine each case. E.g., copying the entire task list on each add operation in a huge tasklist would be very bad. However, for query/category lists it would perform well. But all of this introduce a new degree of complexity. Perhaps just synchronize would be fine to us. > My impression was that these locks only needed if you need to do lock/unlock > between method calls. Otherwise there should be more space for optimization > when locking is isolated within one method... > Yes it is only a matter of optimization. There is a trade-off between complexity and optimization.
Willian, Eugene: I haven't been able to take a close look yet, have you converged on a proposal for handling this concurrency?
Created attachment 55620 [details] prototype patch for concurrency issues I think that current anteraction nature is mostly iteration. So, we should use CopyOnWriteArrayList/CopyOnWriteArraySet and ConcurrentHashMap for all collections. This patch replaces collections directly involved into task list data structures, but I could of missed something. Also, I haven't wrapped collections returned from these classes into the unmodifiable wrappers...
Created attachment 55621 [details] mylar/context/zip
Mik, please note that I haven't really tested this patch. It is more to outline the idea...
-1 about the CopyOnWrite-collections and simply use the ConcurrentHashMap. 1 - I think CopyOnWrite collections would be an overkill for this situation. It will turn each add operation expensive. If I'm not wrong, CopyOnWrite are only for situations where changes are rare and read operations many times more frequent, so the price of not using any synchronization/concurrent collection is very worth. On this case we don't have data to justify it. 2 - About the ConcurrentCollections, I'm afraid to use them, because they can mask some concurrency bug, i.e., there are methods that touches the internal task list AND the internal category list. I wonder if there can be some situation when another process touch one of the structures. On this case, I think the better is to synchronize the whole block (or method). But all of this are just speculation. I hadn't analyzed all possible scenarios.
Created attachment 55841 [details] Proposal patch using synchronized Like Eugene said, this is only to outline my idea too :) I'm not sure if it is secure to apply this type of change for 1.0.1.
Created attachment 55842 [details] mylar/context/zip
(In reply to comment #18) > -1 about the CopyOnWrite-collections and simply use the ConcurrentHashMap. There is no concurrent hash map for ordered collections, so copy on write is the only option besides synchronization. Note that you'll have to synchronize/block on entire iteration cycle trough collection elements. And that is much worse then create the copy. > 1 - I think CopyOnWrite collections would be an overkill for this situation. If I recall correctly there are some copy-on-read happening already. So, my proposal should only make things better. :-) > It will turn each add operation expensive. If I'm not wrong, CopyOnWrite are only > for situations where changes are rare and read operations many times more > frequent, so the price of not using any synchronization/concurrent collection > is very worth. On this case we don't have data to justify it. That is the thing. Tasks and hits aren't added that often. So, we do have case for copy on write. > 2 - About the ConcurrentCollections, I'm afraid to use them, because they can > mask some concurrency bug, i.e., there are methods that touches the internal > task list AND the internal category list. I wonder if there can be some > situation when another process touch one of the structures. On this case, I > think the better is to synchronize the whole block (or method). It won't be any worse. The idea is that concurrent collections give you a slice of time. So, if things are added afterwards, they will be all processed at the next iteration.
(In reply to comment #21) > If I recall correctly there are some copy-on-read happening already. So, my > proposal should only make things better. :-) *sigh* :) I think the CopyOnRead case can (must?) be optimized in future by analyzing each case; e.g., move the method that is iterating into the tasklist and synchronized it... > That is the thing. Tasks and hits aren't added that often. So, we do have case > for copy on write. Humm... no? :-) http://www.jroller.com/page/eu?entry=mylar_has_turned_one I presume there are at least 400+ (based on your estimative of 1 bug/day) bugs on your tasklist. Every time you add one, or open some new query hit, it will create a new arraylist and copy all elements into it. It is not a scalable solution... like I said, the CopyOnWrite are for rare changes. > It won't be any worse. The idea is that concurrent collections give you a slice > of time. So, if things are added afterwards, they will be all processed at the > next iteration. > Like I said, I hadn't analyzed all use cases, so that is why I'm afraid on simply turning the internal collections on concurrent collections.
(In reply to comment #22) > I think the CopyOnRead case can (must?) be optimized in future by analyzing > each case; e.g., move the method that is iterating into the tasklist and > synchronized it... I doubt you can move any iteration block into the task list, unless you are willing to use command/closure pattern, which is less natural to code around. > > That is the thing. Tasks and hits aren't added that often. So, we do have case > > for copy on write. > I presume there are at least 400+ (based on your estimative of 1 bug/day) bugs > on your tasklist. Every time you add one, or open some new query hit, it will > create a new arraylist and copy all elements into it. You know what they say about statistics? Believe it or not, but 1 task a day added is nothing comparing to hundreds of times task list is being iterated trough. > It is not a scalable solution... like I said, the CopyOnWrite are for rare > changes. Aren't those rare enough? Do you want it to be added not more often then once a year or what?
*** Bug 167308 has been marked as a duplicate of this bug. ***
Willian, Eugene: thanks for the analysis and dialog. For 1.0.1 I have tried to pick the least disruptive path, and so what I have done is made the four stores of task list data be ConcurrentHashMap's. All were either sets or maps so this was a straightforward change. I believe that for this case the concurrent collections actually give us the expected behavior that Eugene describes: when you iterate over the TaskList, you get the collection that corresponds to the "slice of time" that you asked for it in (i.e. additions made while you are iterating will be ignored). The only testing that I have done of this is to run the AllTasksTests suite 10x in a row. Previously a few runs would reliably cause a concurrency problem, and I can no longer reproduce. Eugene: on today's conference call you mentioned looking for places that made copies of task list contents, and that this should no longer be necessary. Were those places in your patch, or did you mean that we should look through the existing accessors of some of the methods?
(In reply to comment #25) > The only testing that I have done of this is to run the AllTasksTests suite 10x > in a row. Previously a few runs would reliably cause a concurrency problem, > and I can no longer reproduce. Cool stuff! > Eugene: on today's conference call you mentioned looking for places that made > copies of task list contents, and that this should no longer be necessary. > Were those places in your patch, or did you mean that we should look through > the existing accessors of some of the methods? No. Those wasn't included into my patch. For example in TaskListWriter has : new ArrayList<ITask>(taskList.getAllTasks()) You may also want to look at the type of AbstractRepositoryQuery.hitHandles field and other collections in the ITaskListElement object hierarchy Also, there is no need to wrap value set returned from concurrent hash map like you did in TaskList.getQueryHits() and few other methods. That set will have same semantics as the originating map.
Thanks for all your input on this guys, it really helped. Ironic how the solution was so simple in the end, but I'm really glad that thanks to your analysis we now have a good understanding of the options and tradeoffs. We should still keep an eye out for any concurrency problems since we do not currently have a direct test for this. Eugene, regarding those wrapped value sets, yes, they don't do anything, but I had to temporarily add them because ConcurrentHashMap.values() returns a Collection, and our current API says we return a Set. I've added TODOs to remove that once we can change the API post 1.0.1. // TODO: remove wrapping once API can change return Collections.unmodifiableSet(new HashSet<AbstractRepositoryQuery>(queries.values())); I searched for all access of these 4 maps, and the only non-test class I found still copying was TaskListWriter, and I removed that copy.
Created attachment 56013 [details] mylar/context/zip
Mik, how about AbstractRepositoryQuery and AbstractTaskContainer classes? Maybe there is few others in this hierarchy. Those also may have same concurrency issues for their children...
Yes, that had crossed my mind, but we can't do it in a reasonable way without changing that API to return Collection<ITask> instead of a set. We should have always had it return a Collection going by Bloch's API guidelines of using the most top-level interface you can, because that would have made this change possible. For post 1.0.1 I have created bug 168878. I just saw a very odd, albeit minor, problem: I deactivate a task via Ctrl+Shift+F9 and it did deactivate, but the Task List still showed it as active. I thought I had seen something like this a week or two ago. Will investigate, and if anyone alse sees something like this please comment.
(In reply to comment #30) > I just saw a very odd, albeit minor, problem: I deactivate a task via > Ctrl+Shift+F9 and it did deactivate, but the Task List still showed it as > active. I thought I had seen something like this a week or two ago. Will > investigate, and if anyone alse sees something like this please comment. I think I found the problem: we're using a List for all active tasks, and clients were using that bogus idiom of making copies of the list to avoid concurrency. What I have done instead is made TaskList.activeTasks a CopyOnWriteArrayList as per Eugene's suggestion. With the current UI this list only has 0 or 1 items in it, s othere is no performance issue.
(In reply to comment #30) > Yes, that had crossed my mind, but we can't do it in a reasonable way without > changing that API to return Collection<ITask> instead of a set. We should have > always had it return a Collection going by Bloch's API guidelines of using the > most top-level interface you can, because that would have made this change > possible. For post 1.0.1 I have created bug 168878. Yes, but for sets you can actually use ConcurrentHashMap with single bogus instance as a value. Then map.keySet() will give you instance of the concurrent set. Same applies to TaskList.tasks, TaskList.categories and TaskList.queries collections. Collection TaskList.queryHits seem to actually need get by key operation... Also note this bogus copy in TaskList class: public void refactorRepositoryUrl(Object oldUrl, String newUrl) { for (ITask task : new ArrayList<ITask>(tasks.values())) { and this strange code in getQueryHits() method (it should not call queryHits.get(handle) twice): AbstractQueryHit hit = queryHits.get(handle); if(hit != null) { result.add(queryHits.get(handle)); }
Fixed both cases.
(In reply to comment #33) > Fixed both cases. What about ConcurrentHashMap fields?
We can do that post 1.0.1 as part of bug 168878, please comment there if you haven't yet.
(In reply to comment #35) > We can do that post 1.0.1 as part of bug 168878, please comment there if you > haven't yet. I did. But change I suggesting does not require API changes.
I know, but I need to put attention on other bugs and am not sure if I have time to do that for 1.0.1.