| Summary: | NatureManager is NOT threadsafe causing incorrect responses to isNatureEnabled() (and others) | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Min Idzelis <min123> | ||||||||||||||||||||||
| Component: | Resources | Assignee: | Szymon Brandys <Szymon.Brandys> | ||||||||||||||||||||||
| Status: | VERIFIED FIXED | QA Contact: | |||||||||||||||||||||||
| Severity: | critical | ||||||||||||||||||||||||
| Priority: | P3 | CC: | browe, john.arthorne, kholdaway, malgorzata.tomczyk, rtaniwa, sptaszkiewicz | ||||||||||||||||||||||
| Version: | 3.6.1 | ||||||||||||||||||||||||
| Target Milestone: | 3.7 M2 | ||||||||||||||||||||||||
| Hardware: | PC | ||||||||||||||||||||||||
| OS: | Windows XP | ||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||
| Bug Depends on: | |||||||||||||||||||||||||
| Bug Blocks: | 325224, 325226, 325227 | ||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||
|
Description
Min Idzelis
Created attachment 177513 [details]
Proposal v01
I don't think the patch goes far enough. There are still a few places where unsynchronized access to variables is made. The read of buildersToNatures in #findNatureForBuilder. The read of descriptors in #lazyInitialize. It also seems to me that #configureNatures should be synchronized, since it performs several operations that should be atomic (configureNature/deconfigureNature pairs) Possibly more, but that's at first glance. Created attachment 177601 [details]
Proposal v02
Created attachment 177612 [details]
Proposal v03
Created attachment 177643 [details]
Fix v04
The previous patch synchronized a bit too much. Nature configure/deconfigure is already protected by the workspace lock. All we need to do here is protected the data structures in NatureManager so that API methods that don't acquire the workspace lock don't run into trouble (such as IWorkspace#getNatureDescriptor(s) and the nature methods on IProject. Also, since nature configure/deconfigure calls client code we need to avoid unnecessary extra locks while calling them.
Thanks John. I released the patch to HEAD. I made #lazyInitialize private. natureEnablements could be perhaps lazily initialized as other collections in NatureManger. Reviewed by code inspection. The fix should work fine, I've only added a minor bug 325425 for synchronization of one of the maps. Will this also get released to the 3.6.2 maintenance stream when it becomes available? (In reply to comment #8) > Will this also get released to the 3.6.2 maintenance stream when it becomes > available? Yes. (In reply to comment #8) > Will this also get released to the 3.6.2 maintenance stream when it becomes > available? See bug 325227. We are still seeing a problem in the 3.4.2 stream. Either this fix doesn't work there or this fix didn't completely fix the thread safety. (In reply to comment #11) > We are still seeing a problem in the 3.4.2 stream. Either this fix doesn't > work there or this fix didn't completely fix the thread safety. Are symptoms on 3.4.2 the same now as they were before the patch was ported back? (In reply to comment #12) > (In reply to comment #11) > > We are still seeing a problem in the 3.4.2 stream. Either this fix doesn't > > work there or this fix didn't completely fix the thread safety. > > Are symptoms on 3.4.2 the same now as they were before the patch was ported > back? Yes basically. If you import a faceted project this assertion will fail: Assert.assertTrue(message, project.isNatureEnabled(FacetedProjectNature.NATURE_ID)); (In reply to comment #13) > Yes basically. If you import a faceted project this assertion will fail: > Assert.assertTrue(message, > project.isNatureEnabled(FacetedProjectNature.NATURE_ID)); FacetedProjectNature is an WST class, see org.eclipse.wst.common.project.facet.core.internal.FacetedProjectNature According to Kevan the test looks this way: - During setUp() a project with FacetedProjectNature is imported to the worksapce - the test calls project.isNatureEnabled(FacetedProjectNature.NATURE_ID)); - tearDown() deletes the project It happens sometimes during the test invocation. Moreover I think we could write a test for the Min's case and see it it causes problems. Created attachment 185251 [details]
Concurrent isNatureEnabled test
Kevin,
I tried to reflect your case in a jUnit test, but I failed to reproduce the problem.
This is how it works:
1. in setUp it imports a project that has some nature enabled (in my case JavaNature)
2. The test starts a large amount of threads (in my case 500)
3. Threads wait random time and ask for isNatureEnabled
The result was always correct. Can you suggest me what can I modify in my test to come closer to your situation?
How do you import your project? I used the same method that import wizard does. Is it possible that import of your project did not finish before you called isNatureEnabled?
No actually, once the problem with natures occurs, it stays until you restart eclipse. In other words, its a timing issue to get into the bad state, but once you are there, it lasts the duration of the session with eclipse. Let me continue to watch our tests to see if it occurs again. Perhaps that will narrow it down a bit more. Created attachment 185303 [details]
Test of parallel importing project and checking isNatureEnabled.
This is an updated version of previous test that would be more likely reflect your situation. I modified it so there are multiple imports of the project and straight after it's checking isNatureEnabled. I also used the FacetedProjectNature. It still failed to reproduce your problem, even with 500 parallel threads, so I suppose there are some more conditions that cause the failure. Do you have any other guesses what else is going on in your workspace that may have influence on isNatureEnabled?
(In reply to comment #17) > Created attachment 185303 [details] > Test of parallel importing project and checking isNatureEnabled. > > This is an updated version of previous test that would be more likely reflect > your situation. I modified it so there are multiple imports of the project and > straight after it's checking isNatureEnabled. I also used the > FacetedProjectNature. It still failed to reproduce your problem, even with 500 > parallel threads, so I suppose there are some more conditions that cause the > failure. Do you have any other guesses what else is going on in your workspace > that may have influence on isNatureEnabled? Kevan, any news about the issue? Yes, we have seen the issue after the fix. Its showed up in our automated tests. I have provided all the information we know as to how to reproduce it. It occurs in new workspaces. (In reply to comment #20) > I have provided all the information we know as to how to reproduce it. It > occurs in new workspaces. Not sure if you read the Gosia's comment. She created a test according to your description and this test does not fail. We need more details or a suggestion how to modify it, so it could fail like your tests. My question about "news" was related to your comment 16, which was: (In reply to comment #16) > Let me continue to watch our tests to see if it occurs again. Perhaps that > will narrow it down a bit more. I saw it and chatted with them. I suggested using larger projects (like a user would actually use). Other than, its very difficult to reproduce. We have only seen it a couple of times in hundreds of automated tests. (In reply to comment #22) > I saw it and chatted with them. I suggested using larger projects (like a user > would actually use). Could we record such information in the bug report next time? Gosia did you applied the Kevan's suggestion in the test somehow? (In reply to comment #22) > I saw it and chatted with them. I suggested using larger projects (like a user > would actually use). Other than, its very difficult to reproduce. We have only > seen it a couple of times in hundreds of automated tests. Kevan, could you attach a conclusion of this chat to this bug? I cannot find any record of it in my chat history. I will create another version of patch containing larger projects, did you make any other suggestions? (In reply to comment #22) > I suggested using larger projects (like a user would actually use). I tried the same test with a fairly large project (>3.5MB) and even with 2000 simultaneous threads project.isNatureEnabled(nature) always returned true. I suppose there has to be some additional condition that makes your tests fail and we won't get anywhere before we discover it. (In reply to comment #25) > (In reply to comment #22) > > I suggested using larger projects (like a user would actually use). > I tried the same test with a fairly large project (>3.5MB) and even with 2000 > simultaneous threads project.isNatureEnabled(nature) always returned true. I > suppose there has to be some additional condition that makes your tests fail > and we won't get anywhere before we discover it. I don't have any additional information. As with most thread safety issues, they are difficult to reproduce due to timing. Perhaps the best we can do is review the code to make sure state is properly protected. When you have a fix that you feel is safe based on code inspection, we can use that. If you feel that your current fix is thread-safe (although it seems there have been exceptions found), we can use that until we find a better way to reproduce it. (In reply to comment #26) > I don't have any additional information. As with most thread safety issues, > they are difficult to reproduce due to timing. Perhaps the best we can do is > review the code to make sure state is properly protected. The added synchronization was already reviewed by two committers. However we could miss something. Gosia, please review the code and see if you can find anything suspected. > When you have a fix > that you feel is safe based on code inspection, we can use that. If you feel > that your current fix is thread-safe (although it seems there have been > exceptions found), we can use that until we find a better way to reproduce it. Of course. Gosia and I will review the current code again. If we can't find any problem with it, we will wait for this better way to reproduce the problem. Created attachment 188503 [details]
Same test but in patch
We can add this test to our nightly builds. It will be tested every day in different conditions, maybe it will somewhere in our test environment.
Chiming in on the test. I don't think the test is very representative of an actual import that happens when using the import wizards, or of what the flow is in our product. Here are some modifications that should be made. Importing a project does not use the project.create() method that takes the project description. Rather, a plain project is created using the method that only takes a monitor. Then, (in case of .zip) the ZipLeveledStructureProvider imports the rest of the project. As part of this process, the project is refreshed at the end. This includes the newly import .project file that contains all information about natures. This notification is sent on a different thread than the one importing it. The problem lies that in our product, a thread sometimes starts checking the project before the project is fully imported, which implies that the .project may or may not have been processed/refreshed yet. Perhaps the value of the enabled natures was cached incorrectly. Modifications that should be made: 1) Use project.create(monitor), then use the ZipLeveledStructuredProvider to import a project. (attached is a utility) 2) Create a thread that solely checks isNatureEnabled on the project that is being imported. The value should not be asserted until the import is actually complete, but the early calls would be there to ensure thread-safety and cache consistency. -- Note on utility. Intended to run in a PDE UI plugin test like this: IRunnableWithProgress wait = ZipImportUtil.createImportRunnable(PROJECT_NAME, BUNDLE_ID); PlatformUI.getWorkbench().getProgressService().busyCursorWhile(wait); This will spin off the file import from the main UI thread. Note: You do not need to delete the project when running it 2 times in a row. It automatically deletes the project if it is present. -- Created attachment 188516 [details]
Importer code
Created attachment 189502 [details]
Fixed concurrent natures
Min, thank you for your input. I managed to reproduce the problem. It occurred in 3.4 but in 3.7 as well.
We were flushing the natures cache on every product description change, that was correct. The problem was that the event was send before the description update have been finished. If isNatureEnabled() check was made after sending the event but before the description import was finished the previous natures set was cached again.
Attached patch contains fix for 3.7.
Would it be possible to get a fix in earlier streams, e.g. 3.6x? Created attachment 189520 [details]
Fixed concurrent natures
Minor correction to previous patch
My request above asking for patches in earlier streams is related to these earlier bugzillas: 3.6.2: https://bugs.eclipse.org/bugs/show_bug.cgi?id=325227 3.4.2: https://bugs.eclipse.org/bugs/show_bug.cgi?id=325224 We could use a patch for the latest 3.4.+ and 3.6.+ streams Thanks Billy Gosia, please open a new bug for this and attach the patch. This bug is about NatureManager threadsafety which was indeed fixed. The new issue is a separate problem. Created Bug 338055 for the seperate problem. |