Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 342268 - Performance improvement for reading IBM dumps - First heap pass
Summary: Performance improvement for reading IBM dumps - First heap pass
Status: RESOLVED FIXED
Alias: None
Product: MAT
Classification: Tools
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 277422
  Show dependency tree
 
Reported: 2011-04-08 07:35 EDT by Brian Peacock CLA
Modified: 2012-08-28 10:41 EDT (History)
0 users

See Also:


Attachments
DTFJIndexBuilder Pass 1 multithreaded (9.72 KB, patch)
2011-04-08 07:39 EDT, Brian Peacock CLA
andrew_johnson: iplog-
Details | Diff
Alternative patch for speeding pass 1 (8.28 KB, patch)
2011-04-11 11:02 EDT, Andrew Johnson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Peacock CLA 2011-04-08 07:35:33 EDT
Build Identifier: 

The two longest (time-wise) sections of DTFJIndexBuilder.fill() are the two iterations of the heaps looking at all the objects. This enhancement will make a first attempt at making the first part multi-threaded.

The basic idea is that most machines that will attempt this sort of function will either be multi-processor servers or Laptops which nowadays will be at least a Duo. Therefore, running multi-threaded in the major sections will reduce the elapsed time.

The first heap scan (Pass 1) doesn't do too much work with each object found, it is basically adding objects addresses to an array creating a basic index that is used throughout the rest of the code. Testing has shown that making it truly multi-threaded, splitting the array into segments for each processor then combining them at the end, didn't make a significant improvement over this basic change given that we can't split the iteration of the heap in the DTFJ code. However, we can get some improvement by getting DTFJ to create lists of objects on one thread and then hand these lists of objects to the array builder on the other thread. Clearly this is only a 2-cpu solution and the work isn't being split equally, which means we get between 120%-140% cpu usage ... but every little helps. Also, the two threads currently spin if (main thread) both the stacks of objects to process are full or (worker thread) there aren't any stacks of objects to process. Making the code more architecturally correct has shown that the throughput isn't increased, in fact it generally decreases.

Thus, this solution could be viewed by purists as a "hack" but it appears to provide the best improvement in elapsed time.

This is the first in a series of changes, others will build on this. As we progress through this class we may come back and tune this particular section later.

This code also adds some of the setup code working out how many processors are available etc, that will be used by subsequent code.



Reproducible: Always
Comment 1 Brian Peacock CLA 2011-04-08 07:39:46 EDT
Created attachment 192815 [details]
DTFJIndexBuilder Pass 1 multithreaded
Comment 2 Andrew Johnson CLA 2011-04-08 09:56:31 EDT
Interesting, though I'd like to see where DTFJ thread safety is documented as guaranteed.

DTFJHeapObjectReader has code to protect against multiple threads reading from a DTFJ dump at the same time.
Comment 3 Andrew Johnson CLA 2011-04-11 10:04:58 EDT
A simpler improvement is to avoid looking at all the superclasses of a JavaClass if we have already looked at the class. This involves also adding all the superclasses when we find the classes using the class loader classes.

I've got some code for that.

The TreeSet might be a bit expensive as it involving finding a class address for every comparison. If JavaClass hashCode and equals is faster then it might be better to have a regular HashSet and then sort it.
Comment 4 Andrew Johnson CLA 2011-04-11 11:02:23 EDT
Created attachment 192935 [details]
Alternative patch for speeding pass 1

Avoid getting super class more than once, and leave sorting (which involves getting the class address) to after collecting the classes.
Comment 5 Andrew Johnson CLA 2012-08-28 10:40:40 EDT
No more work is planned to be done under this defect - we'll use another defect if there are more ideas.
Comment 6 Andrew Johnson CLA 2012-08-28 10:41:39 EDT
Comment on attachment 192815 [details]
DTFJIndexBuilder Pass 1 multithreaded

Patch not used