This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 281893 - [theme] improvements to task list synchronization to reduce load on servers
Summary: [theme] improvements to task list synchronization to reduce load on servers
Status: CLOSED MOVED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: 3.2   Edit
Hardware: All All
: P1 major with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance, plan
: 281894 (view as bug list)
Depends on: 213242 238064 291240 291247 291252 329637 353458
Blocks:
  Show dependency tree
 
Reported: 2009-06-29 15:18 EDT by Mark Phippard CLA
Modified: 2012-02-16 12:35 EST (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Phippard CLA 2009-06-29 15:18:49 EDT
Imagine a corporate setting where a lot of developers might all start Eclipse and nail the server at the same time.  I know Mylyn synchs on startup, but does it start the "20 minutes" at the point that the first synchronization stops?  If it does, then that would at least build a little variation into everyone's clocks so that every 20 minutes the server does not get nailed again.  Likewise, have you considered adding a small amount of fudge to the interval?  Like some smallish random interval added to the 20 minutes.  I'd suggest some number between 1 and 120 seconds to create some variation?  It might be useful to also do this at startup, although personally I wish Mylyn just waited the 20 minutes before it did the first synchronization.

Finally, can a repository provider specify a minimum interval?  So that users cannot specify something like 1 minute as their interval and stress the server?  Since Mylyn runs in the background, a user might not even realize how much they are hurting the server performance.
Comment 1 Steffen Pingel CLA 2009-06-29 15:33:02 EDT
*** Bug 281894 has been marked as a duplicate of this bug. ***
Comment 2 Jörg Thönnes CLA 2009-06-30 03:19:28 EDT
Good idea, Mark!

With regard to the mininum interval, a good way would be to mark every task
with a last updated timestamp and avoid refreshing the task if the timestamp is
less than some minimum amount old. This would also cope nicely with overlapping
query results.

Example:

Query A returns tasks 1, 2, 3, whose details are refreshed.
Query B returns tasks 2, 4, 6.

If both queries are run shortly one after each other, task 2 would be refreshed
in a short cycle. A timestamp would prevent the second refresh.

An exception could be if the user explicitely selects task 2 and refreshes it.
Comment 3 Steffen Pingel CLA 2009-06-30 15:45:22 EDT
> With regard to the mininum interval, a good way would be to mark every task
> with a last updated timestamp and avoid refreshing the task if the timestamp is
> less than some minimum amount old. This would also cope nicely with overlapping
> query results.

Bugzilla keeps a time-stamp for the last modification date of every task.

> Query A returns tasks 1, 2, 3, whose details are refreshed.
> Query B returns tasks 2, 4, 6.
> 
> If both queries are run shortly one after each other, task 2 would be refreshed
> in a short cycle. A timestamp would prevent the second refresh.

That optimization is already in place. Only changed tasks are refreshed in a second pass after all Bugzilla queries have been updated.
Comment 4 Jörg Thönnes CLA 2009-08-12 06:24:02 EDT
I would suggest to change the summary to:

"Randomize task list synchronization to reduce load on servers"

This is related to the specific suggestions in the description.
Comment 5 Denis Roy CLA 2009-08-18 09:42:21 EDT
One thing I'd like to avoid is this:

[18/Aug/2009:09:29:17 -0400] "POST /bugs/buglist.cgi HTTP/1.1" 200 4319 "-" "Mylyn/3.2.0 BugzillaConnector Eclipse/3.5.0 (org.eclipse.epp.package.rcp.product) HttpClient/3.1 Java/1.6.0_14 (Sun) Linux/2.6.28-14-generic (i386; en_US)"

[18/Aug/2009:09:29:18 -0400] "POST /bugs/buglist.cgi HTTP/1.1" 200 2760 "-" "Mylyn/3.2.0 BugzillaConnector Eclipse/3.5.0 (org.eclipse.epp.package.rcp.product) HttpClient/3.1 Java/1.6.0_14 (Sun) Linux/2.6.28-14-generic (i386; en_US)"


The IP is removed to protect the innocent, but here the same Mylyn client submitted to different search queries (POST is different size) at the same time.  For all we know, this person could be standing at the water cooler chatting with coworkers. It's unfair for this client to 'hog' the database with two 'low-priority' searches while the person using the Web UI is actually sitting at the computer waiting for their search results.
Comment 6 Mark Phippard CLA 2009-08-18 09:53:04 EDT
Denis,

Just to be clear ... your complaint is that a single client submitted more than one query at essentially the exact same time?  You would want to be sure Mylyn queues these up to run one at a time?

I actually thought it did that already.  I did not know it could make more than one hit to the server at the same time.
Comment 7 Denis Roy CLA 2009-08-18 10:09:15 EDT
> Your complaint is that a single client submitted more than
> one query at essentially the exact same time?  You would want to be sure Mylyn
> queues these up to run one at a time?

Well, the typical web user doesn't, in all practicality, issue multiple searches simultaneously, and since most Mylyn searches are 'wasted' (ie, the client is not waiting for them, and the results yielded are mostly not used), it would be fair for Mylyn to behave more like the Web UI in this respect.

Have a look at this behaviour.  This is all from the same IP, and it has the exact same Mylyn/browser signature, using the Pulsar package.  IMHO this is a fair amount of traffic for one client to be generating every 10 minutes, and eight searches from a single client is a large enough spike to affect performance for other users.

09:34:11 POST /bugs/buglist.cgi HTTP/1.1" 200 1061
09:34:12 POST /bugs/buglist.cgi HTTP/1.1" 200 3170
09:34:15 POST /bugs/buglist.cgi HTTP/1.1" 200 934
09:34:23 POST /bugs/buglist.cgi HTTP/1.1" 200 3075
09:34:25 POST /bugs/buglist.cgi HTTP/1.1" 200 1029

09:44:34 POST /bugs/buglist.cgi HTTP/1.1" 200 1137
09:44:35 POST /bugs/buglist.cgi HTTP/1.1" 200 3128
09:44:36 POST /bugs/buglist.cgi HTTP/1.1" 200 1260
09:44:37 POST /bugs/buglist.cgi HTTP/1.1" 200 3044
09:44:38 POST /bugs/buglist.cgi HTTP/1.1" 200 934
09:44:39 POST /bugs/buglist.cgi HTTP/1.1" 200 1029
09:44:40 POST /bugs/buglist.cgi HTTP/1.1" 200 1649
09:44:41 POST /bugs/show_bug.cgi HTTP/1.1" 200 2178

09:54:44 POST /bugs/buglist.cgi HTTP/1.1" 200 1070
09:54:45 POST /bugs/buglist.cgi HTTP/1.1" 200 3128
09:54:46 POST /bugs/buglist.cgi HTTP/1.1" 200 1260
09:54:47 POST /bugs/buglist.cgi HTTP/1.1" 200 3044
09:54:48 POST /bugs/buglist.cgi HTTP/1.1" 200 934
09:54:49 POST /bugs/buglist.cgi HTTP/1.1" 200 1029
09:54:50 POST /bugs/buglist.cgi HTTP/1.1" 200 1649

09:58:53 POST /bugs/buglist.cgi HTTP/1.1" 200 1138
09:58:54 POST /bugs/buglist.cgi HTTP/1.1" 200 3128
09:59:02 POST /bugs/buglist.cgi HTTP/1.1" 200 934
09:59:03 POST /bugs/buglist.cgi HTTP/1.1" 200 1029
09:59:05 POST /bugs/show_bug.cgi HTTP/1.1" 200 2465
09:59:15 POST /bugs/index.cgi HTTP/1.1" 200 3466  <-- not sure why posting to index?
09:59:16 POST /bugs/show_bug.cgi HTTP/1.1" 200 2050
Comment 8 Dani Megert CLA 2009-08-18 10:35:56 EDT
At the annual Eclipse project release retrospective it was mentioned by our committers that bugzilla performance is slowing them down on their daily work and hence anything that will make bugzilla faster will be highly appreciated.
Comment 9 Denis Roy CLA 2009-09-02 10:39:27 EDT
Sorry for increasing this to major, but as people are returning from holidays, our database servers are getting slammed with Mylyn/Bugzilla search queries.

Take a look at these sobering facts.  Our logs rotate at midnight, and it's 10:30am.

bugs-vm1:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | wc -l
18241
bugs-vm1:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | grep -c "Mylyn"
16160

bugs-vm2:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | wc -l
12207
bugs-vm2:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | grep -c "Mylyn"
10304


26464/30448 (87%) searches were from Mylyn clients.

Fortunately, since the upgrade to 3.4 I can send search queries to both our DB servers, but our CPU load is too high (372/400 in use).

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14504 mysql     15   0 3228m 2.7g 6148 S  372 18.0   9039:08 /usr/sbin/mysqld

In the meanwhile, I'll have no choice but to write some Apache rewrites to limit/throttle the amount of buglist requests that Mylyn-signed clients can request.  I hate to do this, but as you can see in comment 7, I think most of these queries are wasteful.  Which error code would likely cause the least problems for Mylyn clients?  404 Not Found?  403 Forbidden?  Gone?  500 Server Error?
Comment 10 Mik Kersten CLA 2009-09-17 12:55:15 EDT
Denis: We will review this on our call today and respond.
Comment 11 Denis Roy CLA 2009-09-18 11:26:56 EDT
> In the meanwhile, I'll have no choice but to write some Apache rewrites to
> limit/throttle the amount of buglist requests that Mylyn-signed clients can
> request.

The September rush back to work seems to have tapered off, so I don't need to investigate this now.  I do appreciate your making this a P1 though, because the problem will surface again at some point.
Comment 12 Robert Elves CLA 2009-09-22 17:05:36 EDT
(In reply to comment #11)
> The September rush back to work seems to have tapered off, so I don't need to
> investigate this now.  I do appreciate your making this a P1 though, because
> the problem will surface again at some point.

Yes, its a concern and we'll be eliminating those redundant synchronizations for 3.3 (bug#239182).
Comment 13 Shawn Minto CLA 2009-09-30 15:40:04 EDT
Something else interesting that we could consider is using the IUserAttentionListener from the ActivityContextManager to reduce the synchronizations when the user is away from their computer (i.e. they leave Eclipse open over the weekend).  E.g. if 2 scheduled synchronizations go by while there has been no attention (attentionLost), we suspend the task list synchronization job until the user comes back (attentionGained).  This will reduce load on the server when the user isn't working and this also means that the user is more likely to see the task change notifications.
Comment 14 Denis Roy CLA 2009-10-01 10:16:27 EDT
> to reduce the synchronizations when the user is away

I love the idea. No sense in tossing 6 queries at us every 10 minutes if you're out having lunch.  I think the UDC works this way too ... I leave Eclipse open all week and sometimes I don't use it for days. All of a sudden, then I bring it up to do something, I get the prompt to upload.
Comment 15 Andrew Gvozdev CLA 2009-10-01 10:30:32 EDT
I wonder if it is possible to implement using "push" method instead of "pull" - with Mylyn using email notifications somehow.
Comment 16 Mik Kersten CLA 2009-10-02 14:50:15 EDT
(In reply to comment #15)
> I wonder if it is possible to implement using "push" method instead of "pull" -
> with Mylyn using email notifications somehow.

Andew: We definitely want to move towards this, but need the corresponding WS API to exist in Bugzilla or an extension that can push changes.  Could you please create a bug against us on this? We've been exploring some ideas around that.

Since this is only an issue on very large, typically OSS, repositories, we were also considering making a white list of those repositories in order to alter their sync interval.  However, if we can find a better solution we'd prefer not to degrade the performance for OSS repositories.

The direction of this thread, ie, throttling the sync interval accordingly when the connection is idle, seems like the best approach for Mylyn 3.3 (Oct. 28th).  I created bug 291237 for that.
Comment 17 Mark Phippard CLA 2009-10-02 15:01:03 EDT
In the user forum today, someone said that Mylyn queries the Bugzilla metadata configuration once per hour.  If that is true, that seems way too often to me.  Especially given that it can be manually refreshed.  I do not know if it is true, but if so I would recommend changing that to something like once per Eclipse session.
Comment 18 Andrew Gvozdev CLA 2009-10-02 16:40:24 EDT
(In reply to comment #16)
> (In reply to comment #15)
> > I wonder if it is possible to implement using "push" method instead of "pull"
> > with Mylyn using email notifications somehow.
> Andew: We definitely want to move towards this, but need the corresponding WS
> API to exist in Bugzilla or an extension that can push changes.  Could you
> please create a bug against us on this? We've been exploring some ideas around
> that.

Alright, if you insist I created bug 291252 and a bunch of some other nasty ones. Thanks for listening.
Comment 19 Robert Elves CLA 2009-10-05 13:22:38 EDT
(In reply to comment #17)
> In the user forum today, someone said that Mylyn queries the Bugzilla metadata
> configuration once per hour.  If that is true, that seems way too often to me.
> Especially given that it can be manually refreshed.  I do not know if it is
> true, but if so I would recommend changing that to something like once per
> Eclipse session.

For a short write up on how frequently configuration is retrieved, see 
 
   http://wiki.eclipse.org/index.php/Mylyn/FAQ#Retrieval_of_repository_configuration
Comment 20 maarten meijer CLA 2009-10-26 05:08:15 EDT
As I work with Mylyn I notice an increasing set of tasks accumulate inside the Mylyn Unassigned category. 
These are tasks that are not part of an existing query, so apparently are currently out of scope.
Maybe it is a good idea to disable automatic synch for these
Comment 21 Denis Roy CLA 2010-01-05 14:03:01 EST
Happy New Year!  :-D
Comment 22 Denis Roy CLA 2010-01-06 11:38:02 EST
bugs.eclipse.org has been getting lots of buglist requests from Mylyn clients since Monday, so as per comment 9 I put in place a way of refusing buglist requests from Mylyn clients should our database servers be hosed.  Do note that the thresholds are quite high, so the impact of this should be very low -- or, lower than the impact of having web users wait for the DB servers.


If anyone is interested, these are the Apache configurations:

    # If servers are too busy, prevent Mylyn buglists
    RewriteCond /path/to/a/file/called/mylyndisabled -f
    RewriteCond %{HTTP_USER_AGENT} Mylyn
    RewriteRule buglist.cgi / [L,R=400]


I simply have a monitoring script that creates (and removes) the mylyndisabled file depending on the load on both DB servers.
Comment 23 Mik Kersten CLA 2010-01-07 13:10:41 EST
Sounds like a reasonable work around for now Denis.  We'll watch for weird beahvior.
Comment 24 maarten meijer CLA 2011-06-29 06:58:24 EDT
To add a new tack on possible resolution to this problem, I suggest that repositories are allowed to define their own preferred refresh policies using a mechanism akin to robots.txt for use by web crawlers. In effect Mylyn is a special kind of web crawler, a bug crawler. 
The policy mechanism should allow settings based on user role (we are logged in as known user most of the time) so a reporter or assignee can refresh more often than a cc or even a bug lurker. Also the mechanism should allow different crawl interval based on bug settings, with new and open having more refresh than verified and closed.
This policy engine should go into the general tasks framework, so it applies to all connectors.
I like working on making Mylyn well behaved (bug 205708) so would be willing to work on this to supply patch.
Comment 25 Mik Kersten CLA 2011-07-07 13:15:31 EDT
I like this idea for the large publicly hosted repositories.
Comment 26 Eclipse Webmaster CLA 2022-11-15 11:45:08 EST
Mylyn has been restructured, and our issue tracking has moved to GitHub [1].

We are closing ~14K Bugzilla issues to give the new team a fresh start. If you feel that this issue is still relevant, please create a new one on GitHub.

[1] https://github.com/orgs/eclipse-mylyn