Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 455070 - Understand, document, (and re-architect?) use of .dat files in analysis
Summary: Understand, document, (and re-architect?) use of .dat files in analysis
Status: CLOSED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.5   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-Releng-Inbox CLA
QA Contact:
URL:
Whiteboard: stalebug
Keywords:
Depends on:
Blocks: 454921
  Show dependency tree
 
Reported: 2014-12-12 11:50 EST by David Williams CLA
Modified: 2020-01-23 15:44 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2014-12-12 11:50:49 EST
From experiences with bug 455035, it seems the "dataDir" (where the .dat) files are stored, is at best undocumented, and may not be working as intended. 

In that bug, we first saw that letting all results accumulate there did not produce the right analysis. 

Then I found that removing them after each "partial analysis" (such as after the "short set" of performance tests, but before the "Long running" test analysis did not produce results that were even near "correct". 

But, I found if I let them accumulate per "buildId" then the results were produced were correct (at least, on the surface ... meaning, it at least produced finger print graphs, included expected scenarios, etc.). 

I'll have to investigate code to find out if that's the "intended use" or, if there is some other issue that just happens to make that appear correct. (such as, perhaps has to be accumulated over "build type", M, I, N, etc.
Comment 1 David Williams CLA 2014-12-15 10:35:21 EST
I'll document some observations, and suspicions, though not concrete conclusions, yet. 

First, I "re-ran" the analysis (not the tests, just the analysis of what had already been collected) for M4, but this time made sure to re-create all the ".dat" files. 

The way things are set up, this involves "two runs" one for the "short set" and one for the "long running" tests. By chance, I noticed if only one of them had ran, the results did not show the "no baseline found" issue, described in bug 454923. But, if both of them ran, then that problem in bug 454923 was back.  

Applying some guess-work, then I found, if the "short running" and "long running" tests used different "data directories" then the problem in bug 454923 was gone. 

This sort of makes some conceptual sense, in that the two sets *do* use a different "baseline run" so something about "combining them" in one "data directory" confuses the analysis program, and it does not know how to "find" the baseline ... whereas if kept "separate", then, I assume, there is only one choice for each part of the run, so all "looks well", by the time if finishes. 

I fact, I'd like to replace the milestone "performance results" with the "new run" ... BUT ... it does change the "results". Some, in little ways, but some, in big ways, since there is no longer an "ant fingerprint" graph. There, there are, for the rest. I have no idea why that would be the case. 

The "new run" can be seen in the "I-build" that the milestone was promoted from: 
http://build.eclipse.org/eclipse/builds/4I/siteDir/eclipse/downloads/drops4/I20141210-2000/performance/performance.php

If nothing else, it does have the "long running" tests added, which were ran, after the formal promote on Friday, and were not available until much later that night. 

My suspicion is that the process of taking data out of the database, and putting in .dat files, actually adds some data to the .dat files, that really should be in the database, to begin with. 

For example, if you "drill down" in the results of M4, you can see the "baseline" listed as 

R-4.4-201406061215_201406061215

That is, a "suffix" is being added to the actual build ID of the baseline. 
I suspect the the analysis program adds this. 

In the re-run, though, no "baseline" is listed in that "details" page. 

There may be an interacting aspect ... note that in the line graphs, the "reference build id" is incorrect name, listing R-4_201406061215 ... that is, missing the has R-4_ instead of R-4.4-. 

In the "re-run", I tried adding the parameter baseline-prefix=R-4.4-_ ... the default, if not specified, is supposed to be the full build id: R-4.4-201406061215. 

So, something is 'odd" there. 

Obviously, much more study is needed, but I suspect we need to collect the data slightly differently, with a bit more "data" added to database. Previously, (years ago) that data could be added "after the fact", but the same assumptions no longer apply.
Comment 2 Eclipse Genie CLA 2020-01-23 15:44:14 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug.

If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.