| Summary: | RESTful API to access download statistics | ||
|---|---|---|---|
| Product: | Community | Reporter: | Andreas Sewe <sewe> |
| Component: | CommitterTools | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | enhancement | ||
| Priority: | P3 | CC: | denis.roy, johannes.dorn |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
|
Description
Andreas Sewe
The databases are quite large, I'd be afraid too much data mining would kill something. We'd definitely need to throttle requests. (In reply to Denis Roy from comment #1) > The databases are quite large, I'd be afraid too much data mining would kill > something. We'd definitely need to throttle requests. Fair enough. But at least for what we have in mind we would just do a single request with a path of "/recommenders/models" or similar and a fixed time span (start/end date), all of which capabilities are available through [1] already. We would then parse the single result page (which is not even particularly large) and process it further locally. So, to summarize: No new, expensive query capabilities are needed; the existing ones are sufficient. They are just hard to work with from a script (that will at most issue one query/day). [1] <https://dev.eclipse.org/committers/committertools/stats.php> It's not *you* I am worried about :) Enabling an API without authentication that can query almost 100,000,000 database rows can potentially open a can of worms once the world discovers they can query our downloads database. We'll see about getting this done, but we want to do it right. (In reply to Denis Roy from comment #3) > It's not *you* I am worried about :) Enabling an API without > authentication that can query almost 100,000,000 database rows can > potentially open a can of worms once the world discovers they can query our > downloads database. > > We'll see about getting this done, but we want to do it right. Would it be possible to limit access to only *.eclipse.org machines? Committers can always SSH-tunnel to build.eclipse.org. To me, that seems the simplest way to restrict access to people you implicitly trust with your resources (= committers). It doesn't matter to me... The dev.eclipse.org website is in Gerrit; feel free to have a look and submit a change if you have a moment. git clone https://id@git.eclipse.org/r/websites/dev.eclipse.org The file you're looking for is: http://git.eclipse.org/c/websites/dev.eclipse.org.git/tree/committers/committertools/stats.php Line 18: $Session = $App->useSession(true); is what requires a login. We can: a) remove that to allow anonymous calls b) perhaps use a parameter to flag the request for a plaintext output c) introduce a new $inc_file, based on http://git.eclipse.org/c/websites/dev.eclipse.org.git/tree/committers/committertools/inc/en_stats_daily.php d) wrap line 359 around an if() to only renter the plaintext $inc_file (In reply to Andreas Sewe from comment #0) > All this information is available through the Download Statistics page [1], > but unfortunately not in a way that a script (running weekly) can easily > access. The two main obstacles are the way authentication is currently done > (big obstacle) and the necessary screen-scraping (smaller obstacle). Would > it be possible to expose this information through a more RESTful interface? > > [1] <https://dev.eclipse.org/committers/committertools/stats.php> We settled for screen-scraping right now, which may be a bit more fragile than a RESTful interface, but was quite easy to accomplish in the end. I hope about 15 date-based queries per day are not a problem. |