| Summary: | Binary Data Transfer Format for Profiling (Scalability) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Chris Elford <chris.l.elford> | ||||||||
| Component: | TPTP | Assignee: | Stanislav Polevic <stanislav.v.polevic> | ||||||||
| Status: | CLOSED FIXED | QA Contact: | |||||||||
| Severity: | enhancement | ||||||||||
| Priority: | P3 | CC: | analexee, asaf.yaffe, chris.l.elford, igor.alelekov, jgwest, jkubasta, kiryl.kazakevich, mikhail.sennikovskiy, Mikhail.Voronin, nmehrega, paulslau, slavescu, sluiman, stanislav.v.polevic, vasily.v.levchenko | ||||||||
| Version: | unspecified | Keywords: | plan | ||||||||
| Target Milestone: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | All | ||||||||||
| URL: | http://www.eclipse.org/tptp/groups/Architecture/documents/features/hf_196713.html | ||||||||||
| Whiteboard: | closed460 | ||||||||||
| Bug Depends on: | 209342, 209343 | ||||||||||
| Bug Blocks: | 218946 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Chris Elford
There is another option to consider here which is more localized to the profiler runtime and does not affect the workbench/client at all: when the profiler is running in standalone mode, the generation of the output XML file is the largest contributor to the profiler overhead. This overhead can be reduced dramatically (I estimate it can go down to less than 10x for full call-graph) if a more efficient output format is used. When the profiling session ends, the resulting output can be converted to the XML format expected by the loaders. Perhaps we should either (a) clone this enhancement request to talk about standalone collection format optimization or (b) treat this standalone binary format as the same as the preferred/optimized streaming format. If we cannot scope/resource (b) for 4.5 perhaps we could at least scope (a) along with a data file importer in the workbench as a first step toward eventual optimized network path for the future. If we get optimized binary network path in 4.5, I would hope that we could leverage that same format for standalone optimized binary format rather than requiring YAFFE (yet another format for everything :-) tagged with 4.5 for 4.5 discussion. Higher priority may be appropriate. Updated feature description URL and time estimates Approved by the AG for TPTP 4.5 with the following comments: -This is in effect a new API. We need to discuss and clearly document the format. -How will this binary format be documented? We will probably need something similar to the Event Specification for Java profiling (http://www.eclipse.org/tptp/platform/documents/resources/profilingspec/XML4Profiling.htm). -Binary format should be reviewed after the design is complete. -Will stand-alone profiling be supported? If yes, a output format parameter and import wizard is required. -This will require a negotiation protocol for initializing this format for backward compatibility. -Will message exploiters be actual pluggable extensions? They are listed under the Extension Points section. -What is the difference between system and data message? -In the second table, why is the descriptor ID a 1? -What is the performance improvement? -Very nice compression ratio. (In reply to comment #5) > -This is in effect a new API. We need to discuss and clearly document the > format. Ok. > -How will this binary format be documented? We will probably need something > similar to the Event Specification for Java profiling > (http://www.eclipse.org/tptp/platform/documents/resources/profilingspec/XML4Profiling.htm). I'm looking at it now. It seems I can produce something similar in a day. > -Binary format should be reviewed after the design is complete. You mean code review? > -Will stand-alone profiling be supported? If yes, a output format parameter > and import wizard is required. Command-line parameter is 'format=[binary|xml]'. I will add few lines about wizard. > -This will require a negotiation protocol for initializing this format for > backward compatibility. Negotiation protocol is mentioned in general. I will elaborate it. > -Will message exploiters be actual pluggable extensions? They are listed under > the Extension Points section. In both Java and C++ sides each message will be represented by a class implementing general interface. Can it be classified as extension point? > -What is the difference between system and data message? System messages control and configure binary stream behavior. Currently, there is only one system message - stream header... Data messages correspond to the XML tags used in current format - they keep the actual profiling data inside... > -In the second table, why is the descriptor ID a 1? Well, this is the stream header, I believe, it should be the first one... > -What is the performance improvement? In standalone mode - binary data is not converted into strings, less data is written to the disc. In controlled mode - less data to send via AC to the client While importing huge generated file on the client - lesser memory consumption - no swapping. > -Very nice compression ratio. Thank you :) > I would like to add to the API discussion. This is in effect a new API. We need to discuss and clearly document the format. This is not a new API. The data formats emitted by the collector is not an API. The data format following the load into the data model (EMF Trace Model) is an API and that is not being modified. Users can consume the EMF Trace Model the way they are consuming today. -This will require a negotiation protocol for initializing this format for backward compatibility. This is being addressed in the implementation. (In reply to comment #7) > I would like to add to the API discussion. > > This is in effect a new API. We need to discuss and clearly document the > format. > This is not a new API. The data formats emitted by the collector is not an API. > The data format following the load into the data model (EMF Trace Model) is an > API and that is not being modified. Users can consume the EMF Trace Model the > way they are consuming today. > > -This will require a negotiation protocol for initializing this format for > backward compatibility. > This is being addressed in the implementation. > Guru, we have also considered the composition of data emitted by an agent and loaded into the EMF model by the loaders equivalent to API. Of course, it is not a Java API (see http://www.eclipse.org/tptp/home/documents/process/development/api_contract.html), but it is used as the interface by agents. We publish the specifications for these data formats (http://www.eclipse.org/tptp/platform/documents/resources/profilingspec/XML4Profiling.htm for trace and http://www.eclipse.org/tptp/platform/documents/resources/dtd_xsd/testExecution.xsd for test execution and http://www.eclipse.org/tptp/platform/documents/resources/dtd_xsd/statistical.xsd for statistical) so users/consumers can emit XML fragments from custom agents and load it into the EMF models. My point from the AG review is that we need to document this new data format clearly so users/consumers can leverage its benefits. (In reply to comment #6) Thansks Stanislav for your response. > > -Binary format should be reviewed after the design is complete. > You mean code review? Yes. > > -Will message exploiters be actual pluggable extensions? They are listed under > > the Extension Points section. > In both Java and C++ sides each message will be represented by a class > implementing general interface. Can it be classified as extension point? Sounds fine. We should move this into the design section of the Description Document or rename that section Extensions. > > -In the second table, why is the descriptor ID a 1? > Well, this is the stream header, I believe, it should be the first one... Makes sense. > > -What is the performance improvement? > In standalone mode - binary data is not converted into strings, less data is > written to the disc. > In controlled mode - less data to send via AC to the client > While importing huge generated file on the client - lesser memory consumption - > no swapping. It would be nice to measure the baseline and then final performance improvement. (In reply to comment #7) > This is not a new API. The data formats emitted by the collector is not an API. The data formats emitted by the collector if they represent the XML4Profiling spec then they follow the API. You are right about defining alternate optimized methods that you can keep provisional/internal in TPTP instead of making API. Here is the other enhancement bug 97886 where we discussed a compressed XML approach. There are some good ideas discussed there. The same ideas can be applied to binary format. especially those described in comment https://bugs.eclipse.org/bugs/show_bug.cgi?id=97886#c6 It would be also good if you can take a look at bug 107521. The main point is to reduce the necessary processing time and amount of transported data in order to optimize the produce and load part of the process. These compressed modes will especially make a big improvement in the full trace modes. For best profiler performance the statistical/aggregated modes should be used (to identify the right filters) before a full trace mode is used with the right filters. (In reply to comment #0) > If we are willing to sacrifice cycles (increase overhead), you could imagine > feeding the XML thru a compression algorithm (e.g., zlib). TPTP workbench > seems to store .trcaxmi files zipped today and gets about a 10x reduction in > overall size. If you try to collect lots of data, including parameters/return values, this kind of compression could visibly increase the performance of the profiler/workbench for XML4Profiling, compressed XML or binary modes. Would be good to do some stress testing with the current XML format to see if there is any improvement when the data is compressed at the source, I expect to see some improvements. On APIs... I do not believe that this is new API http://www.eclipse.org/tptp/platform/documents/resources/profilingspec/XML4Profiling.htm is horribly out of date. When I first started looking at TPTP a few years ago Guru pointed me at this spec and while it bears a similarity to the trace files that were generated, it is not identical. This caused some confusion for me because I was thinking of creating a standalone tool to read the XML... examples: agentCreate includes an undocumented agentType field, lack of any documentation of the existance of an agMethodEntry record type, there are lots more examples... There are also lots of fields documented that I have yet to see in any trace file that I have created using TPTP agents. If it was API of some sort, I would have expected it to be up to date and correct. As it is, I consider it historically and now a loose document for implementors rather than a contract. Having said that, I do believe that the format produced in binary should be documented for our own sanity and future maintenance purposes. It is not however API. IMO, API begins when the loader inserts the data into the model... Thx, Chris (In reply to comment #12) > On APIs... > > I do not believe that this is new API > > http://www.eclipse.org/tptp/platform/documents/resources/profilingspec/XML4Profiling.htm > is horribly out of date. When I first started looking at TPTP a few years ago > Guru pointed me at this spec and while it bears a similarity to the trace files The fact that the XML4Profiling.htm document is out of date doesn't preclude the fact that the format is considered API. We always seen that format and all the other XML fragments (from Test, CBE, Statistical etc) as APIs and we made sure that we won't break backward compatibility. That state should be maintained until they are marked deprecated. Having a binary format internal to TPTP should be fine, although at the end you want to enable the consumer products to use it in their tools and in that case it needs to become public and API. Eventually the techniques used to create the format defined here should be applied (incrementally) to other event formats not only for the profiling events. Replying to #13>
> The fact that the XML4Profiling.htm document is out of date doesn't preclude
> the fact that the format is considered API.
I disagree. I would say that if it has been out of date for 2 years without any consumers noticing or caring enough to complain with defects in any explicit way that is is no longer API even if it once was.
In reply to #14>
Note that the XML format (the current one which is inconsistent with XML4Profile) will continue to be supported. Consumers that want data partially compatible with XML4Profile can still enable that mode. There is no need to create a binary format contract even if XML mode has a partially fulfilled format contract because XML will continue to be available.
I believe the binary format should be internal for the forseeable future. I want to ensure that each agent that chooses to implement binary transfer should have flexibilty to create their own exchange format and data model loader and evolve both from release to release based on packet analysis to make compression more efficient over time.
I fully expect for very few agents to choose to provide a binary path. Most agents do not produce data streams of sufficient size to necessitate it.
(In reply to comment #15) > Replying to #13> > > The fact that the XML4Profiling.htm document is out of date doesn't preclude > > the fact that the format is considered API. > > I disagree. I would say that if it has been out of date for 2 years without > any consumers noticing or caring enough to complain with defects in any > explicit way that is is no longer API even if it once was. I agree that the document is out of date and lately wasn't actually used to drive the work in this area, a more up to date document that describes the Trace/Profiling events is platform/org.eclipse.tptp.platform.models/src-trace/model/traceEvents.xsd (http://dev.eclipse.org/viewcvs/index.cgi/platform/org.eclipse.tptp.platform.models/src-trace/model/traceEvents.xsd?root=TPTP_Project&view=markup). I don't agree that in the last 2 years nobody cared about the profiling events format (and I'm not talking about the XML4Profiling.htm document), as the TPTP TI profiler agent (which was moved from Tech Preview to GA in TPTP 4.4, June 2007) was developed using this format (in the past 2 years). > I believe the binary format should be internal for the forseeable future. I > want to ensure that each agent that chooses to implement binary transfer should > have flexibilty to create their own exchange format and data model loader and > evolve both from release to release based on packet analysis to make > compression more efficient over time. > > I fully expect for very few agents to choose to provide a binary path. Most > agents do not produce data streams of sufficient size to necessitate it. > This is in sync with what I mentioned above, with only one concern, the longer the new optimized mode will be kept internal/provisional, the harder will be to promote the optimized profiling agent (and other agents) mode to consumers or integrators. In my view TPTP should provide a platform in the first place then an exemplary implementation. Today the loader infrastructure is flexible enough to support any XML fragment format (including the compressed XML mode described in bug 97886, with use of simple proxy loaders), new non XML based input formats (like the binary one discussed here) can be further optimized through some small changes (initially investigated in bug 107521). Today is also possible to plug binary formats, without any changes to the EMF loaders or loader infrastructure, just by setting a specialized handler/parser in the org.eclipse.hyades.loaders.util.XMLLoader before the XMLLoader.loadEvent methods are called. This can be done by overriding XMLLoader.makeScaner or by creating an org.eclipse.tptp.platform.models.fragment_handler extension point instance and set the class attribute to the new handler/parser (which needs to implement org.eclipse.hyades.loaders.util.XMLFragmentHandler, ignore the XML prefix, this handler interface can support any data format). The specialized parser should be able to handle the new binary format and trigger the startDocument/startElement/attributeName/attributeValueCharacters/characters/endElement/endDocument defined in org.eclipse.hyades.loaders.util.IXMLLoader interface. This can be easily optimized by using int IDs instead of names and char arrays instead of Strings (as in org.eclipse.hyades.models.hierarchy.util.internal.IExtendedLoader). The new specialized handler/parser could also check if the format is the current XML format and delegate to the current XML handler/parser (for example the first event is <TRACE> or <TRACE type="compressedXML" version="4.5"> for XML ,and for binary mode <TRACE type="binary" version="4.5"> or any other magic word you'll choose). The binary format loaders can be registered through the same mechanism as the XML ones are registered today. So with a simple extension point instance as described in the previous paragraph you could enable binary event mode and still support the XML event mode. The missing part will be the negotiation of these new event format modes through the profile/attach launch configurations (to ensure backward/forward compatibility). To plug a specialized model infrastructure (non EMF based) will require extensive changes, because most of the TPTP UI is EMF driven and in many cases it is not done in a scalable way. Although as of today many views are query/filter based there are still lots of places where direct access to the model is used and sometimes full traversal of the model is done to produce the analysis/view/reporting required data. Initially for the Log model, we had implemented a scalable resource infrastructure (aka Large Resource Support), which provides a RDB backed EMF resource. This allows the EMF model to be stored/retrieved incrementally in/from a RDB database. Also an optimized event load mode was created for Log events (CBEtoCSVOutputter) which puts data directly in the RDB database without going through the XML event loaders and EMF model. The recommended and optimal way to navigate the model in these RDB backed resources is through the query model defined in Hierarchy.Extensions package. The same query model (queries) is used today to navigate/filter the XMI based resources in all the Log/Symptom, many of the Trace and some of the Test views. Optimizing any step in the processing pipeline will help, optimizing all the steps (at least for a few important scenarios) will have the most impact in the user experience of the TPTP profiler (and not only, as the same mechanism can be used in the other TPTP domains). (In reply to comment #16) The discussion thread here is getting out of whack. Moving to a binary format event, a "pre-agregrated" event, an compressed event are all interesting ideas. Each having it's own merit. Along with using something "smaller" than XMI for persistence, these have chronically been the popular ideas to improve speed, and footprint, aka scalability. This particular enhancement describes using binary events, and that should not assume only a mirror of the XML event and open binary stream currently supported. If some new events that leverage aggregation at the collection end are proposed, they should be given due consideration. This would be the normal things covered in design discussion and review. Introduction of these new events would be candidates as new API but would normally need to go through the provisional state like all other api. Regarding API: In Hyades/TPTP, the data format on the wire is API. That has been a declared API since Hyades 1.0, and it has served us well to have this api. The various event formats and associated loaders are the contract for how to get the models loaded. Yes the document on the web site is old, as is the EMF model documentation and much of the web site documentation, but that is a separate issue. The people that have been owning and working with the event formats are aware that the maintenance of this material has been a concern for several releases, and pointing to yet other document is not the answer. The poor maintenance is not a reason to declare this is not API. If that was the case there is a great deal of the project that has become questionable in the last few years, and much new API that shouldn't be. Exploiters have gotten by by talking to the committers, as I did recently with Alex about the monitor events, which btw turned up yet another version of the document. So I think we need to clean up old documentation, a request I make every release, and as noted above we get on with the design discussion of this enhancement and assess the new and improved event formats. As the only actual committer in the model component right now I will start an effort to consolidate and clean up the documentation. (In reply to comment #17) > This particular enhancement describes using binary events, and that should not > assume only a mirror of the XML event and open binary stream currently > supported. If some new events that leverage aggregation at the collection end > are proposed, they should be given due consideration. This would be the normal > things covered in design discussion and review. Introduction of these new > events would be candidates as new API but would normally need to go through > the provisional state like all other api. I wouldn't mix new aggregated or non aggregated events with binary format, as far as I can see this enhancement is only about a binary (or XML compressed) representation for the current event schema, a similar request was made a while ago in bug 97886. New aggregated or non aggregated events should be covered in separate new enhancements. They will mostly required new loaders, and probably EMF model and UI changes. > So I think we need to clean up old documentation, a request I make every > release, and as noted above we get on with the design discussion of this > enhancement and assess the new and improved event formats. > As the only actual committer in the model component right now I will start an > effort to consolidate and clean up the documentation. That's great, as I pointed out above the traceEvents.xsd is pretty up to date so it's a good starting point. Let me know if you need any help. Originally the trace events DTD/schema was owned and mantained by the profiler agent team and I still think they will need to maintain it because they are driving the largest part of this specification. For 4.5, I am not particularly concerned with consumption of binary agent data streams by consumers that build on top of TPTP. I am primarily concerned with making TPTP usable/scalable for direct users of the open source TPTP profiler. I am not opposed to eventual creation of an API suitable for consumers that build on top of TPTP. I consider that that outside the scope of this enhancement and outside of what would be desirable for 4.5. Enhancements beyond binary data transfer/import [e.g., rdb backed emf or emf alternative] for profiler data is outside the scope of this enhancement. There is clearly some difference of opinion on exactly what constitutes API for the existing XML based profile. I understand that something has been API since Hyades... Is it one of three different specifications that have been found? Is it the current implementation? Is it one of the three specifications that were found with differents versus the current implementation tagged as provisional? I would suggest that the making of XML4Profile up to date be tagged as an separate defect [or enhancement request] and scoped accordingly. I'd love to hear from the JVMTI profiler guys as to what extent they used the specifications versus using the existing profiler codebase as a template. I doubt that the specification was used extensively or at all. There seems to be some confusion about the aggregated events that I was talking about. Aggregated callgraph has been around in the JVMPI [and TI] profiler for years now. However, you would not see that if you read the spec for the profiling XML spec has been around since something like TPTP 4.1, 4.2? There may well be some compressed event types that get created as part of the binary stream creation. For 4.5, these would be totally internal. (In reply to comment #19) > For 4.5, I am not particularly concerned with consumption of binary agent data > streams by consumers that build on top of TPTP. I am primarily concerned with > making TPTP usable/scalable for direct users of the open source TPTP profiler. > I am not opposed to eventual creation of an API suitable for consumers that > build on top of TPTP. I consider that that outside the scope of this > enhancement and outside of what would be desirable for 4.5. As a consumer (similar for another TPTP component other then profiler agent) I would really like to use these new optimized event transport formats in my scenarios and avoid building a similar stack myself. > Enhancements beyond binary data transfer/import [e.g., rdb backed emf or emf > alternative] for profiler data is outside the scope of this enhancement. I agree, I just pointed them out to show that only by optimizing the whole processing stack we will see the most benefic impact. > There is clearly some difference of opinion on exactly what constitutes API for > the existing XML based profile. I understand that something has been API since > Hyades... Is it one of three different specifications that have been found? > Is it the current implementation? Is it one of the three specifications that > were found with differents versus the current implementation tagged as > provisional? I would suggest that the making of XML4Profile up to date be > tagged as an separate defect [or enhancement request] and scoped accordingly. > > I'd love to hear from the JVMTI profiler guys as to what extent they used the > specifications versus using the existing profiler codebase as a template. I > doubt that the specification was used extensively or at all. Making abstraction of any outdated event spec XSD or DTD, the code itself is API and the code is the ultimate place where you can see what exactly is supported. For example there are events in the XSD/DTD that are never emitted by the TPTP profiling agents and some of them still have associated loaders. The rule is that any event that TPTP currently uses/supports (or some consumer requested) has an associated loader. Actually everything should be still valid in those XSD/DTD documents, the problem is that some new events are missing, org.eclipse.hyades.loaders.trace.TraceXMLLoadersFactory.getSupportedElements() contains the up to date supported list. After a quick look there seems to be only 5 events missing from the spec (only two of them being actually used today in the JVM PI/TI agents). > > There seems to be some confusion about the aggregated events that I was talking > about. Aggregated callgraph has been around in the JVMPI [and TI] profiler for > years now. However, you would not see that if you read the spec for the > profiling XML spec has been around since something like TPTP 4.1, 4.2? > We have supported statistical modes on the EMF model side since Hyades, but only in TPTP 4.2 we introduced some aggregated events aka agMethodEntry/agMethdodExit with initially support from JVMPI agent and later from JVMTI. > There may well be some compressed event types that get created as part of the > binary stream creation. For 4.5, these would be totally internal. As I mentioned above, please don't mix the event type/spec (the one defined in the XSD or DTD which is right way to model the events) with the event transport format (using XML, text or binary structures). We can have any number of output/transport formats (binary being one of them) for the same event type/spec, see TPTP GLA outputter for a similar approach. Of course you can make both the new event types and event transport formats internal and that way you wouldn't worry about integration or compatibility. Regarding (aggregated) profiling structures, you may want to take a look at TPTP XRay mechanism, although JVMPI based I recall having very compact and optimized data structures. Those will be helpful in local scenarios cases where you could bypass several steps in the processing pipe line (with the possibility to create EMF objects directly instead of going through the loader infrastructure and lookup services). (In reply to #20) I hope that we will eventually get to new API in this space (binary transfer). I think there is value here for consuming products in this space. Given limited resourcing and potential additional support burden associated with API that will unfortunately need to be sometime in the future. I also hope that we will get to an alternate backing store (such as db or some other scalable form.) Honestly, I would think that both TPTP users and consuming products will want both this and scalable streaming formats. We unfortunately did not have the resources available to go after this problem or for 4.5. Another big area that we didn't have resources to touch was evolving heap profile. I would suggest moving the discussion about XML4Profile onto another thread or defect since the existing XML is not being modified. I agree that getting this documented and understood by everyone is important. I am curious about the historical log of when these other events (you mention 5) became provisional API and when they were approved to move from provisional to full API. I don't think it is directly related to this enhancement however. >> As I mentioned above, please don't mix the event type/spec (the one >> defined in the XSD or DTD which is right way to model the events) with >> the event transport format (using XML, text or binary structures). Stanislav, can you comment about whether the design is a 1:1 mapping to record types changing only the transport layer? As I understand the most recent doc that I have seen, it is only a transport change. >> Of course you can make both the new event types and event transport formats >> internal and that way you wouldn't worry about integration or compatibility. My original hope was that the design would be a 1:1 mapping in about 80-90% of cases with potential creation of some internal records/transports in maybe a few key areas. Stanislav, can you comment about the XRay suggestion from Marius? thx -- Chris Guys, sorry for late reply. I implemented binary format emitting part as 1:1 mapping to existing events taken from DTD even though some of the events are not in use. Also, I have implemented a set of binary loaders based on existing XML loaders to reuse the data structure and the code dealing with EMF. It seems to work. Regarding XRay: I shall take a look a it to reuse some good practices. (In reply to comment #22) > Guys, sorry for late reply. > I implemented binary format emitting part as 1:1 mapping to existing events > taken from DTD even though some of the events are not in use. > > Also, I have implemented a set of binary loaders based on existing XML loaders > to reuse the data structure and the code dealing with EMF. It seems to work. > It is a bit concerning from an open source practices standpoint that you have already implemented the event generation and loaders and the code/design has not been reviewed or checked in incrementally. Such as sizable contribution is now going to require an IP scan etc.. When will we have a review of the event formats and loaders? I was about to start regenerating the XML event documentation, we will need to have documentation for these binary events as well. I am assuming the event creation code has been reviewed by the TI team. I hope the loaders actually use and don't copy a lot of the existing loader code so that the performance improvements we make can be shared. > (In reply to comment #22) > > It is a bit concerning from an open source practices standpoint that you have > already implemented the event generation and loaders and the code/design has > not been reviewed or checked in incrementally. Such as sizable contribution is > now going to require an IP scan etc.. Both profiler and client side implementations just extend current functionality and these are written from scratch. Also, I shall provide a description of the format events which are equal to current XML events (according to the XML format DTD). > > When will we have a review of the event formats and loaders? Well, the code requires some internal testing and review, after that I shall provide the patches for both client and profiler for review. > I was about to > start regenerating the XML event documentation, we will need to have > documentation for these binary events as well. I am assuming the event creation > code has been reviewed by the TI team. I hope the loaders actually use and > don't copy a lot of the existing loader code so that the performance > improvements we make can be shared. > Yes, I extended existing XML loaders functionality using same extension point for retrieving BF loaders. Thanks Stanislav. By IP review I meant the scan the foundation will now have to do and the entries needed in the project IP log etc.. All extra management overhead we like to avoid for feature work. It has been noted on a few threads that the DTD was a bit out of date and the XSD is actually the best source we have. There are only a few differences, but worth noting. So the sooner you can provide a description of the event formats the sooner I can help ensure we are all consistent. Given only the TI agent will emit these event should keep this simple ;-) By extending the XML events, I didn't mean the extension points, I meant following the same coding pattern and inheriting some of the implementation, such as the addYouselfInContext methods since this is where the mapping to the model takes place. Consistent error recover etc. would also be important to have. Getting public reviews going before th code is complete seems like the right thing to do. (In reply to comment #25) > > By extending the XML events, I didn't mean the extension points, I meant > following the same coding pattern and inheriting some of the implementation, > such as the addYouselfInContext methods since this is where the mapping to the > model takes place. Consistent error recover etc. would also be important to > have. Yes, I inherited existing XML loaders - providing completely new implementation abuses OOP fundamentals :) > > Getting public reviews going before th code is complete seems like the right > thing to do. > I'm not familiar with the eclipse processes - could you advise me on doing that? (In reply to comment #26) > I'm not familiar with the eclipse processes - could you advise me on doing > that? > I suggest you start with some Intel folks first ;-) Sometimes there is more internal than external processes. I am sure Chris Elford or Alexander Alexeev would be glad to help you. (In reply to comment #27) > I suggest you start with some Intel folks first ;-) > Sometimes there is more internal than external processes. > > I am sure Chris Elford or Alexander Alexeev would be glad to help you. > Ok. The code will be reviewed tomorrow by TI team. After that I will provide BF emitting part (TI profiler side) and add events description into enhancement document. I'd like to defer this feature to i6 since it has to pass code review and internal testing before being committed into the source. Note, that current implementation is a not 100% of feature coverage. At the next week I will provide the patch which includes BF (Binary Format) events generation from TI profiler. I have attached initial implementation of the C++ part responsible for producing the events. See bug 209342 for details. Created attachment 88288 [details]
BF events enlisted
Attaching latest revision of the BF document which includes all events transmitted by TI profiler for th binary format.
Several widely exploited events do not use a lot of attributes, so the format can be lightened by eliminating these attributes entirely or introducing the new field in the event header - bitmask of used fields (say 32), but this will introduce the number of attributes to the number of bits in bit-mask (16/32/64/...).
(In reply to comment #32) - bitmask of used fields (say 32), but this will > introduce the number of attributes to the number of bits in bit-mask > (16/32/64/...). sorry, I meant 'this will limit the number of attributes' Created attachment 89144 [details]
Stream header changed, two system messages introduced
Changes:
- stream header is introduced
- two new system messages are introduced
- binary trace files should have trcbin extension
Client-side initial implementation is attached to the bug 2009343. Notes: I use existing XML[*]Loader classes to avoid spawning additional set of binary loaders, however, binary format requires support for primitive types, i.e. short, int, long, double etc. To solve this I'd like to introduce additional interface with overloaded addAttribute method for each of the primitive data types. Currently, XMLFragmentLoader#addAttribute accepts String value and converts it into the destination primitive type, so I made the following workaround - since all XML loaders extend IgnoredXMLFragmentLoader, I implemented the new addAttribute() methods there and each method converts primitive value to String and calls addAttribute(String, String). That, in turn, causes double conversion effect, which is not efficient, but is enough to show the main idea, which is about to introduce overloaded addAttribute methods in each XML loader and make addAtttribute(String, String) call overloaded methods for each primitive type (thus, compatible with current XML format) So, implementing overloaded addAttribute methods in each XML loader requires additional effort and it can be done later if code owner (or Harm?) will approve this refactoring. there is a more detailed description document of the format etc. that looks very good, and complete. Could someone attach it or update the current description document with the newer more detailed one. (In reply to comment #35) > Client-side initial implementation is attached to the bug 2009343. > Notes: > I use existing XML[*]Loader classes to avoid spawning additional set of binary > loaders, however, binary format requires support for primitive types, i.e. > short, int, long, double etc. > To solve this I'd like to introduce additional interface with overloaded > addAttribute method for each of the primitive data types. > Currently, XMLFragmentLoader#addAttribute accepts String value and converts it > into the destination primitive type, so I made the following workaround - since > all XML loaders extend IgnoredXMLFragmentLoader, I implemented the new > addAttribute() methods there and each method converts primitive value to String > and calls addAttribute(String, String). > > That, in turn, causes double conversion effect, which is not efficient, but is > enough to show the main idea, which is about to introduce overloaded > addAttribute methods in each XML loader and make addAtttribute(String, String) > call overloaded methods for each primitive type (thus, compatible with current > XML format) > > So, implementing overloaded addAttribute methods in each XML loader requires > additional effort and it can be done later if code owner (or Harm?) will > approve this refactoring. > Actually this more general add attribute approach is relatively new but unfortunately is by definition now API and needs to be preserved. It is ok for string based events like XML (although it is a set and not and add), but I agree a direct setXyz(value) method is more appropriate. A direct set will avoid the lookup and the extra conversion. Good. So, currently, I left a possibility for the future optimization w/o changing existing XMLFragmentLoader interface. Thanks. (In reply to comment #37) > Actually this more general add attribute approach is relatively new but > unfortunately is by definition now API and needs to be preserved. It is ok for > string based events like XML (although it is a set and not and add), but I > agree a direct setXyz(value) method is more appropriate. A direct set will > avoid the lookup and the extra conversion. The addAttribute approach is there since Hyades 1.0, it works pretty well for data coming from SAX based evens (which is the only case supported before TPTP 4.5). (In reply to comment #38) For binary events which usually should be handled at once (after they are received completely), there is org.eclipse.hyades.models.hierarchy.util.internal.ExtendedFragmentLoader which in conjunction with the generated Java code (this was recently removed from CVS) for /org.eclipse.tptp.platform.models/src-trace/model/traceEvents.xsd could handle any event type. Currently the addYourselfInContext is called on endElement, but because we don't really want to intermix XML with binary events on the same data channel, then we just call the right controller and pass the session related data channel input stream for each case (BinaryLoader in the binary case). Or the XMLLoader (the class name should change at least) could be changed in such way that it will automatically detect and handle both cases (check the header of the input stream for example, before passing it to the event parser). BinaryLoader/parser would build the beans and then pass them to the corresponding loader through the addYourselfInContext(Object bean) method, which for existing loaders could just call the old addYourselfInContex after setting the internal bean field with the required casting. The existing loaders could be easily migrated to use this approach, basically moving the loader attributes which are today directly set through XMLFragementLoader.addAttribute method into the bean attributes corresponding to each loader. Another approach would be to copy the bean content in the existing attributes before calling the old addYourselfInContex() method. If this approach would be used, the existing loaders (which would need binary support) would be able to transparently handle both XML and binary events, so extra registration or new classes with duplicated code are avoided. Take also a look at the new methods from org.eclipse.hyades.models.hierarchy.util.internal.IExtendedLoader, for example endElement could be used to trigger the new addYourselfInContext(Object bean) method. Marius, Thanks for your notes Please, see the answers below. > > For binary events which usually should be handled at once (after they are > received completely), there is > org.eclipse.hyades.models.hierarchy.util.internal.ExtendedFragmentLoader which > in conjunction with the generated Java code (this was recently removed from > CVS) for /org.eclipse.tptp.platform.models/src-trace/model/traceEvents.xsd > could handle any event type. Yes, these events are handled at once after the parsing is complete. I see some limitation using ExtendedFragmentLoader#getAttribute(int, Object): - performance degradation caused by conversion from primitive types to their wrapper classes and vice versa in FragmentLoader (when assigning to primitive attributes). - implementation of addAttribute(int, Object) will consist of if/else logic for checking exact class instance (Short/Integer/Long etc) which is also not efficient I implemented a new interface BinaryFragmentLoader which nothing more than manifestation of overloaded addAttribute(String, [byte|short|int|long|double]) methods, which would require some additional effort to be implemented w/o changing existing XMLFragmentLoader interface that is critical for extension point which uses this interface. > > Currently the addYourselfInContext is called on endElement, but because we > don't really want to intermix XML with binary events on the same data channel, > then we just call the right controller and pass the session related data > channel input stream for each case (BinaryLoader in the binary case). > Or the XMLLoader (the class name should change at least) could be changed in > such way that it will automatically detect and handle both cases (check the > header of the input stream for example, before passing it to the event parser). Currenlty I have inherited XMLLoader to be prepared for selection of XML/Binary formats at runtime. > > BinaryLoader/parser would build the beans and then pass them to the > corresponding loader through the addYourselfInContext(Object bean) method, > which for existing loaders could just call the old addYourselfInContex after > setting the internal bean field with the required casting. This is what I've done, but I use existing loaders as beans, just invoking corresponding addAttribute methods. I think it is enough for now. After the details of format will be finalized I (or somebody else) can refactor these events in approved way. > > The existing loaders could be easily migrated to use this approach, basically > moving the loader attributes which are today directly set through > XMLFragementLoader.addAttribute method into the bean attributes corresponding > to each loader. Another approach would be to copy the bean content in the > existing attributes before calling the old addYourselfInContex() method. Again, this refactoring should be approved. > > If this approach would be used, the existing loaders (which would need binary > support) would be able to transparently handle both XML and binary events, so > extra registration or new classes with duplicated code are avoided. So do they. > > Take also a look at the new methods from > org.eclipse.hyades.models.hierarchy.util.internal.IExtendedLoader, for example > endElement could be used to trigger the new addYourselfInContext(Object bean) > method. > I see it uses attributeId rather then name, so I anticipate much effort in converting id->name since I use existing loaders almost naturally :) Please, take a look at the patch to see how it is implemented. See org.eclipse.hyades.loaders.hierarchy.IgnoredXMLFragmentLoader for the integration of the new interface to the existing loaders. See org.eclipse.hyades.loaders.internal.binary.v1.ParserImpl1 for parsing process itself See org.eclipse.hyades.loaders.internal.binary.v1.BFAgentCreateParser as an example of parsing the binary event and filling up the existing loader. Thanks. Changing dependencies to child defects Created attachment 93126 [details]
Handshaking protocol description added
Changes:
- compatibility issues revisited
- handshaking protocol described
Resolving as FIXED given patches for 209342 and 209343 are committed to Head As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open. As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open. |