Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 433366

Summary: RESTful API to access download statistics
Product: Community Reporter: Andreas Sewe <sewe>
Component: CommitterToolsAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED WONTFIX QA Contact:
Severity: enhancement    
Priority: P3 CC: denis.roy, johannes.dorn
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:

Description Andreas Sewe CLA 2014-04-24 04:26:25 EDT
Hi,

as a follow up to Bug 427772, we at the Code Recommenders project are trying to plot model download statistics over time, to answer the question how often is which kind of model (call recommendations, override recommendations) for which framework downloaded.

All this information is available through the Download Statistics page [1], but unfortunately not in a way that a script (running weekly) can easily access. The two main obstacles are the way authentication is currently done (big obstacle) and the necessary screen-scraping (smaller obstacle). Would it be possible to expose this information through a more RESTful interface?

[1] <https://dev.eclipse.org/committers/committertools/stats.php>
Comment 1 Denis Roy CLA 2014-05-15 15:19:28 EDT
The databases are quite large, I'd be afraid too much data mining would kill something.  We'd definitely need to throttle requests.
Comment 2 Andreas Sewe CLA 2014-05-16 03:08:34 EDT
(In reply to Denis Roy from comment #1)
> The databases are quite large, I'd be afraid too much data mining would kill
> something.  We'd definitely need to throttle requests.

Fair enough.

But at least for what we have in mind we would just do a single request with a path of "/recommenders/models" or similar and a fixed time span (start/end date), all of which capabilities are available through [1] already. We would then parse the single result page (which is not even particularly large) and process it further locally. So, to summarize: No new, expensive query capabilities are needed; the existing ones are sufficient. They are just hard to work with from a script (that will at most issue one query/day).

[1] <https://dev.eclipse.org/committers/committertools/stats.php>
Comment 3 Denis Roy CLA 2014-05-16 10:02:02 EDT
It's not *you* I am worried about  :)  Enabling an API without authentication that can query almost 100,000,000 database rows can potentially open a can of worms once the world discovers they can query our downloads database.

We'll see about getting this done, but we want to do it right.
Comment 4 Andreas Sewe CLA 2014-05-16 10:16:25 EDT
(In reply to Denis Roy from comment #3)
> It's not *you* I am worried about  :)  Enabling an API without
> authentication that can query almost 100,000,000 database rows can
> potentially open a can of worms once the world discovers they can query our
> downloads database.
> 
> We'll see about getting this done, but we want to do it right.

Would it be possible to limit access to only *.eclipse.org machines? Committers can always SSH-tunnel to build.eclipse.org. To me, that seems the simplest way to restrict access to people you implicitly trust with your resources (= committers).
Comment 5 Denis Roy CLA 2014-05-16 10:25:52 EDT
It doesn't matter to me... The dev.eclipse.org website is in Gerrit; feel free to have a look and submit a change if you have a moment.

git clone https://id@git.eclipse.org/r/websites/dev.eclipse.org

The file you're looking for is:

http://git.eclipse.org/c/websites/dev.eclipse.org.git/tree/committers/committertools/stats.php

Line 18: $Session = $App->useSession(true);  is what requires a login.  

We can:
a) remove that to allow anonymous calls
b) perhaps use a parameter to flag the request for a plaintext output
c) introduce a new $inc_file, based on http://git.eclipse.org/c/websites/dev.eclipse.org.git/tree/committers/committertools/inc/en_stats_daily.php
d) wrap line 359 around an if() to only renter the plaintext $inc_file
Comment 6 Andreas Sewe CLA 2014-07-17 09:15:13 EDT
(In reply to Andreas Sewe from comment #0)
> All this information is available through the Download Statistics page [1],
> but unfortunately not in a way that a script (running weekly) can easily
> access. The two main obstacles are the way authentication is currently done
> (big obstacle) and the necessary screen-scraping (smaller obstacle). Would
> it be possible to expose this information through a more RESTful interface?
> 
> [1] <https://dev.eclipse.org/committers/committertools/stats.php>

We settled for screen-scraping right now, which may be a bit more fragile than a RESTful interface, but was quite easy to accomplish in the end.

I hope about 15 date-based queries per day are not a problem.