Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 234460

Summary: TPTP hashtable implementation experiences table-entry loss in certain cases
Product: z_Archived Reporter: Jonathan West <jgwest>
Component: TPTPAssignee: Jonathan West <jgwest>
Status: CLOSED FIXED QA Contact:
Severity: critical    
Priority: P1 CC: jkubasta, sluiman, stanislav.v.polevic, xubing
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Whiteboard: closed460
Bug Depends on:    
Bug Blocks: 221278    
Attachments:
Description Flags
Bug 234460 patch
none
Bug 234460 - Patch 2 none

Description Jonathan West CLA 2008-05-28 14:29:33 EDT
There are four hashtable.c implementations in the agent controller, each of which originated from a file of the same name that was contributed to TPTP by Scapa as part of Perfmon. Three of the four implementations are used by Perfmon soley, while the fourth is provided for use to the AC codebase by the TPTPUtil component.

This HashTable implementation is used in the tptpProcessController, namedPipeTL, sharedMemTL, socketTL, and TPTPAgentCompTL/TPTPClientCompTL through BaseTL.

The HashTable has no fixed size, and grows when a certain load factor is reached (hardcoded to 75% of capacity, initial capacity is 101). 

Unfortunately, the _TPTPUtil version of HashTable.c_ contains a bug in it's tableRehash(...) function: Whenever the hashtable rehashes (after a certain number of entries are added to it) one of its entries is unintentionally dropped from the table. That is, when the HashTable reaches its load factor, the array size of the table is doubled, all entries are rehashed and copied, and in doing so one entry fails to get copied to the new array.

The _original version_ of the TPTP code, and all other implementations _presently_ in use (e.g. PerfMon) look like this:


	for (i = oldCapacity; i-- > 0;) {
		for (old = oldMap[i]; old != NULL; ) {
			(.... rehash code: copy oldMap[i] into newMap ...)
		}
		oldMap[i] = NULL;
	}



The _present version_ of the TPTP hashtable.c file looks like this:

	for (i = oldCapacity-1; i-- > 0;) {
		for (old = oldMap[i]; old != NULL; ) {
			(.... rehash code: copy oldMap[i] into newMap ...)
		}
		oldMap[i] = NULL;
	}


Notice that 'oldCapacity' has become 'oldCapacity-1'. The contributor misread the for statement semantics: for(a;b;c) { d } will execute a,b,(d,c) and not a,d,c,b. Thus i will start at oldCapacity-2, and not oldCapacity-1 as it should.

The change was made as part of bug 134361 by Kevin P O'Leary at Intel in April 2006. 

Every hashtable in use has an initialCapacity of 101, and will be rehashed when 76 entries are contained in the table. Most of the AC tables will never reach this capacity. However, BaseTL (and therefore ClientCompTL) does not remove connections from its hashtable when they are completed, and thus will inevitably trigger this behaviour after 76 or so Hyades client connections.

Unfortunately, this is the cause of bug 221278, and has also been observed as a contributing factor to other critical/blocker bugs as well.

Because we are late into the TPTP release period, my recommendation to reduce the risk associated with this change is as follows:
1. Fix the rehash function (return oldCapacity to its original value)
2. Increase the capacity of the hashtable, in order to reduce the incidence of the rehash code being called (thereby reducing the inherent risk of late delivery of this change into the code)
3. Trace through the hashtable code, by hand, to ensure correctness.

For the first recommendation, this bug blocks (at least) 1 critical bug, and thus IMHO should be delivered into 4.5.

For the second recommendation, the thinking here is that by increasing the hashtable size ten fold (to 1001, or perhaps more) we nearly eliminate calls to the the rehash function for most users. This is a temporary measure until we can run the corrected rehash function through a full development test pass (post 4.5). There are only ten or so hashtable instances in the agent controller, and this increase would add approximately (4 bytes per pointer * 900 additional entries * 10 entries) 36,000 bytes, or about 35 kilobytes to the agent controller footprint (double on 64-bit systems).

The third is an additional check to ensure code correctness.
Comment 1 Jonathan West CLA 2008-05-28 14:29:47 EDT
Joanna, can you assign to me?
Comment 2 jkubasta CLA 2008-05-28 20:54:37 EDT
Harm, Jonathan spoke to me about this change. Hash table is incorrect but may be too risky to include in 4.5. Any strong opinion either way?
Comment 3 Jonathan West CLA 2008-05-28 21:51:24 EDT
Created attachment 102551 [details]
Bug 234460 patch
Comment 4 Jonathan West CLA 2008-05-28 21:54:25 EDT
I don't want to forestall a discussion of this bug, but, since TP2 is close, Stanislav can you review this patch? If approved, we don't want to have requested a review too late and thus missed the TP2 start.. Thanks!
Comment 5 Harm Sluiman CLA 2008-05-28 22:47:23 EDT
(In reply to comment #2)
> Harm, Jonathan spoke to me about this change. Hash table is incorrect but may
> be too risky to include in 4.5. Any strong opinion either way?
> 

I honestly don't see why there is such concern about the risk. Jonathan has done a great job of investigation.The error seems rather obvious, and it clearly fails when it fails. I would fix the code, and temporarily reduce the threshold to make testing easier. The perfmon component is deprecated so it seems there is only one component affected by the change.

So I would apply the fix and pick a size slightly larger than typical needs (it seems the current number is fine) An intelligent set of test will confirms the fix . And do this all ASAP....
Comment 6 Jonathan West CLA 2008-05-29 16:55:50 EDT
As per Harm's comments above, I will test functionality with a small initial hashtable size (one :) ), and produce a patch for check-in that leaves the initial hashtable size at 101. Thus, testing during TP2 (and for the 4.5.0 release) will use the hashtable implementation with an initial hashtable capacity of 101.
Comment 7 Jonathan West CLA 2008-06-01 20:31:07 EDT
Created attachment 103054 [details]
Bug 234460 - Patch 2
Comment 8 Jonathan West CLA 2008-06-01 20:38:02 EDT
Stanislav, can you review this second patch?
Comment 9 Stanislav Polevic CLA 2008-06-02 08:02:24 EDT
Jonathan, your patch looks good.
Comment 10 jkubasta CLA 2008-06-02 09:09:39 EDT
Patch committed to Head with PMC approval. Copyright updated on check in
Comment 11 Paul Slauenwhite CLA 2009-06-30 12:12:19 EDT
As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open.