Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 264930 - [performance] Avoid duplicate task updates for non-disjunct queries in scheduled synchronizations
Summary: [performance] Avoid duplicate task updates for non-disjunct queries in schedu...
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: 3.1   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.2   Edit
Assignee: Steffen Pingel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-14 06:34 EST by Jörg Thönnes CLA
Modified: 2009-06-14 17:57 EDT (History)
1 user (show)

See Also:


Attachments
mylyn/context/zip (4.13 KB, application/octet-stream)
2009-03-08 17:00 EDT, Steffen Pingel CLA
no flags Details
patch (7.97 KB, patch)
2009-06-13 02:25 EDT, Steffen Pingel CLA
no flags Details | Diff
mylyn/context/zip (3.75 KB, application/octet-stream)
2009-06-13 03:07 EDT, Steffen Pingel CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Thönnes CLA 2009-02-14 06:34:50 EST
As described in bug 263127: Configure synchronization schedule and notifications per repository, query or working set
I have a lot of queries running on my Trac task repository.

The query results are not disjunct since I both query by milestone (planning) and by component (software structure):
* ALL for component
* ALL for milestone
* NOW for component
* NEXT for component
* NOW for ALL components

etc.

So one task is matched by 2-3 queries on average. That means also, that it is checkd for updates 2-3 times.

Here the behaviour could be optimized by introducing a minimal time between updates, e.g. 10 seconds.
So if the last updates is 5 seconds old, it is skipped.

Maybe we could differentiate between automatic and manually triggered updates.
Comment 1 Steffen Pingel CLA 2009-02-14 14:38:27 EST
That's a good suggestion. The Trac connector could avoid retrieving tasks when running queries by leveraging enhancements in Mylyn 3.0.
Comment 2 Jörg Thönnes CLA 2009-03-03 15:59:25 EST
(In reply to comment #1)
> That's a good suggestion. The Trac connector could avoid retrieving tasks when
> running queries by leveraging enhancements in Mylyn 3.0.

Any idea when this could be implemented. Today our sysadmin reported that I had about 58.000 XML-RPC accesses this day.
Many of them are probably obsolete / duplicate.

This is a major issue for us we are currently solving using a server upgrade...
Comment 3 Steffen Pingel CLA 2009-03-03 16:05:16 EST
Thanks for this data point, that's a lot of requests. Most of these requests should be fairly cheap and small in size though. We have not yet planned the upcoming release but I'll keep this in and the other limitations of the Trac connector in mind for the 3.2 planning.
Comment 4 Jörg Thönnes CLA 2009-03-05 05:41:33 EST
(In reply to comment #3)
> Thanks for this data point, that's a lot of requests. Most of these requests
> should be fairly cheap and small in size though. We have not yet planned the
> upcoming release but I'll keep this in and the other limitations of the Trac
> connector in mind for the 3.2 planning.

Actually, I had the lowest figures since I run my queries every 2 hours. Other team member had more than 200.000 requests a day.

Could I contribute something here? Now as I have a running bootstrap config...

Please attach a context and some suggestions.

Thanks, Jörg
Comment 5 Steffen Pingel CLA 2009-03-08 17:00:14 EDT
Great to hear that you are running bootstrapped!

Generally, the Trac connector checks for changed tasks in the repository during synchronization and only reruns queries if the repository has changed. This check is very cheap. If you have a lot of activity in the repository though queries are still run frequently.

I think biggest potential for optimization is in TracXmlRpcClient.search(): Queries only return a list of task ids and currently the search() queries the full task details for every returned task. As you have noticed this means a task is retrieved multiple times if it is returned as a result for multiple queries.

In theory the retrieval of the full task details is not necessary when querying, since TracRepositoryConnector.preSynchronization() already detects changed tasks and updates these separately. This means you could try to add another search method that only returns a list of ids and construct a partial taskData object for each result. While this will only reduce the number of XML-RPC calls for each synchronization by the number of queries in your task list, retrieving the full details for all query results is the most expensive call.

It could also help to enable logging in the XML-RPC library to trace what calls are actually made. I haven't yet verified what I said above and could be missing other spots for optimization.
Comment 6 Steffen Pingel CLA 2009-03-08 17:00:22 EDT
Created attachment 127966 [details]
mylyn/context/zip
Comment 7 Steffen Pingel CLA 2009-06-13 02:25:37 EDT
Created attachment 139090 [details]
patch
Comment 8 Steffen Pingel CLA 2009-06-13 03:07:16 EDT
I have committed the patch. The change should dramatically reduce the overhead for background synchronizations. 

I'll make a new weekly build available tomorrow. Joerg, it would be great if you could verify if the change makes a difference in your environment.
Comment 9 Steffen Pingel CLA 2009-06-13 03:07:19 EDT
Created attachment 139092 [details]
mylyn/context/zip
Comment 10 Steffen Pingel CLA 2009-06-14 17:57:03 EDT
Joerg, I have posted instructions how to enable tracing of XML-RPC calls for Trac: http://wiki.eclipse.org/Mylyn/FAQ#How_do_I_enable_debugging_output_for_plug-ins.3F . It helpful if you could try that and open a new bug in case you notice any oddities.