Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 290605

Summary: [evaluator] Inadequate support for large integers
Product: [Modeling] OCL Reporter: Oliver Wong <owong>
Component: CoreAssignee: OCL Inbox <mdt-ocl-inbox>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: achim.demelt, alexander.igdalov, dvorak.radek, ed, owong
Version: unspecifiedKeywords: plan
Target Milestone: 3.1.0Flags: ed: indigo+
Hardware: PC   
OS: All   
Whiteboard: Compliance
Bug Depends on: 318048    
Bug Blocks: 156363    
Attachments:
Description Flags
AST/CST change for Big numbers
none
Use of BigDecimal and BigInteger internally none

Description Oliver Wong CLA 2009-09-25 17:11:34 EDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/3.0.195.21 Safari/532.0
Build Identifier: I20090611-1540

In the OCL specification at http://www.omg.org/docs/ptc/06-05-01.pdf section 11.4.2, it says that the OCL type "Integer" refers to the mathematical concept of an integer, with no upper bound. However, when we try to pass in a query with large integers, we get a parse exception. For example, the query "1196888373 < 9999999999" generates the following stack trace:

Caused by: java.lang.NumberFormatException: For input string: "9999999999"
	at java.lang.NumberFormatException.forInputString(Unknown Source)
	at java.lang.Integer.parseInt(Unknown Source)
	at java.lang.Integer.valueOf(Unknown Source)
	at org.eclipse.ocl.parser.AbstractOCLParser.createIntegerLiteralExpCS(AbstractOCLParser.java:516)
	at org.eclipse.ocl.parser.OCLParser.ruleAction(OCLParser.java:1154)
	at lpg.lpgjavaruntime.DeterministicParser.processReductions(DeterministicParser.java:55)
	at lpg.lpgjavaruntime.DeterministicParser.parse(DeterministicParser.java:115)
	at org.eclipse.ocl.parser.OCLParser.parseTokensToCST(OCLParser.java:106)
	at org.eclipse.ocl.lpg.AbstractParser.parseTokensToCST(AbstractParser.java:220)
	at org.eclipse.ocl.parser.OCLAnalyzer.parseConcreteSyntax(OCLAnalyzer.java:140)
	at org.eclipse.ocl.parser.OCLAnalyzer.parseInvOrDefCS(OCLAnalyzer.java:263)
	at org.eclipse.ocl.internal.helper.HelperUtil.parseQuery(HelperUtil.java:164)
	at org.eclipse.ocl.internal.helper.OCLHelperImpl.createQuery(OCLHelperImpl.java:175)
	... 3 more

It appears that the current implementation is mapping OCL's Integer type onto Java's int type. In our EMF models, we're using fields representing timestamps in the form of milliseconds since epochs, and so we frequently use numbers which are too big to be store in Java ints, and which need to be instead be stored in longs.

It seems like the "most correct" solution would be to map OCL's Integer onto Java's BigInteger, but I guess that might be a performance problem. Perhaps we can change the implementation to use long "for now" as a quick patch solution, until a solution is found to support truly arbitrarily large integers?

Reproducible: Always
Comment 1 Ed Willink CLA 2009-09-25 17:44:23 EDT
There is support for BigDecimal and BigInteger values that originate in models, but as you observe this is missing in the concrete syntax.

I think the correct solution is to check the string length and parse for a BigInteger when too many digits or too large a value for Integer.

Internally the generic java.lang.Number inheritance should sort out the alternatives.
Comment 2 Ed Willink CLA 2009-12-18 13:12:22 EST
*** Bug 298205 has been marked as a duplicate of this bug. ***
Comment 3 Ed Willink CLA 2010-01-12 06:21:08 EST
I'm trying to support BigInteger and BigDecimal in the revised evaluator in the ReflectiveLibrary branch.

The observable problem is that the parser method createIntegerLiteralExpCS uses Integer.valueOf to parse the lexed string. This is easily fixed but then

a) IntegerLiteralExpCS.integerSymbol is an Integer
b) IntegerLiteralExp.integerSymbol is an Integer

(ditto for UnlimitedNatural, Real)

Therefore we

either introduce BigIntegerLiteralExp etc and surprise clients with new classes
or change IntegerLiteralExp.integerSymbol to be a Number (Short,Integer,Long,BigInteger or Float,Double,BigDecimal in practice).

Introducing new classes will give tightly-coupled clients run-time failures since the requisite handlers will be missing.

Changing the type to Number will give tightly-coupled clients compile-time failures.

I prefer the change to Number since variant IntegerLiteralExp or RealLiteralExp classes are not supported by the OCL specification, and since this exposes rather than hides the API change.
Comment 4 Ed Willink CLA 2010-01-12 06:26:28 EST
*** Bug 246895 has been marked as a duplicate of this bug. ***
Comment 5 Ed Willink CLA 2010-01-12 14:36:01 EST
Created attachment 155901 [details]
AST/CST change for Big numbers

The attached patch changing from Integer/Double to Number for {Integer,Real,UnlimitedNatural}LiteralExp{,CS} surprisingly does not break any OCL or QVTd test. Extra JUnit tests validate the parsing, serialisation and deserialisation of Big numbers.

The patch causes no compilation problems for QVTo. I am unable to tell whether QVTo run-time is affected; 790/815 tests appear to fail in my workspace.

NB. Previous behaviour was to NFE on big numbers. 

The new behaviour parses, serialises and deserialises big numbers.

Maths on Big numbers is poorly supported. The rewrite of operations in the ReflectiveLibrary branch will fix this.

Temporarily some high precision Double numbers will be parsed as BigDecimal causing failure when used in mathematical operations.
Comment 6 Ed Willink CLA 2010-01-13 05:10:01 EST
(Patch seems to have a few missing @since's.)

If we're changing the type of integerSymbol and realSymbol anyway, we could introduce new wrapper types org.eclipse.ocl.types.Integer and org.eclipse.ocl.types.Real that encapsulate the choice of representation type. This would localize some of the Double or BigDecimal logic. [I'm undecided whether this is a good idea, since it creates an extra object for every numeric literal and is not backwards compatible with the serialised AST, so I raise this possibility inviting positive/negative comments.]

We probably need some xxxOption to allow users to control the threshold at which arithmetic promotes from Double to BigDecimal (Integer promotion is easy; grow on overflow). The patch chooses the precision of a litertal based on whether the string representation has 15 or more letters so -1.0 is Double and -1.000000000e00 is BigDecimal. When to promote Double / Double to BigDecimal is very subjective. BigDecimal has a precision, so choosing an appropriate precision is also subjective.
Comment 7 Ed Willink CLA 2010-01-13 06:02:51 EST
The test_supportForELongAttributes_198451 provides some insight into the existing behaviour.

Integer, Long, BigInteger, Double and BigDecimal

are preferred representations.

Byte, Short, Float

are tolerated representations that are converted to Integer, Double.

The test

assertEquals(1, evaluate(helper, thing, "numeros->at(1).asLong() - numeros->at(2).asLong()"));

passes because the Long - Long maths returns Long but the result which is 1 is passed through NumberUtil.coerceNumber which tries to find a simpler representation and converts long 1 to integer 1 which is equal to the asserted integer 1.

This behaviour is a little expensive but well-defined. The patch extends this support for arbitrary intermediate results to arbitrary persisted results. We just want to refine the new createRealNumber to use BigDecimal for anything with any fractional digits or more than 15 integer digits and to coerceNumber the result back to Double where possible. No change to createIntegerNumber which is already 'optimal'. [0.1 cannot be precisely represented by Double, whereas it is trivial with BigDecimal.]
Comment 8 Ed Willink CLA 2010-01-13 06:34:37 EST
Further:

            assertEquals((int) maxIntMinusOne, evaluate(helper, thing,
                String.format("(%d + %d).div(2) - 1", maxInt, maxInt)));

passes when adding an Integer overflow because all numeric arguments are promoted to a higher precision before any numeric operation.

Ow! we are incurring a promotion cost on every numeric argument, and a demotion cost on every numeric result.

Ow! we are losing precision on decimals; bug 299477.

Ow! we are not detecting long overflows; Java does not report arithmetic issues.

Given the costs we are incurring through conversions, we might as well just use BigInteger and BigDecimal throughout. The AST once again has narrow types.
Comment 9 Ed Willink CLA 2010-01-14 09:45:13 EST
Created attachment 156100 [details]
Use of BigDecimal and BigInteger internally

[Attached patch applies to latest CVS, in which generated code has been regenerated to avoid reviewing 100 files of gratuitous differences on this patch.] 

Changing the internal and persisted representations of Real and Integer to BigDecimal and BigInteger has a bit more impact than changing to Number, but leads to more predictable simpler behaviour and solves most of the problems other than extensibility and legibility. API Compatibility is enhanced for scalar values by invoking NumberUtil.coerceNumber only on the final result rather than every intermediate calculation.

[Since UnlimitedNaturalLiteralExp.intergerSymbol is changing type, we might as well track Issue 14196 and change its spelling at the saem time.]

Impact on tests: non-scalar result checks must use BigDecimal or BigInteger.
Impact on QVTd: two one-line changes are required to resolve compilation errors.
Impact on QVTo: no changes needed for compilation problems.

Visible change:

RealLiteralExp(CS).realSymbol is BigDecimal rather than Double.
IntegerLiteralExp(CS).integerSymbol is BigInteger rather than Integer.
UnlimitedNaturalLiteralExp(CS).integerSymbol is BigInteger rather than Integer and name unlimitedNaturalSymbol.

setRealSymbol, setIntegerSymbol provide some compatibibility for original signatures.

Scalar return from OCL.evaluate unchanged because coerceNumber invoked on result.
Collection/Tuple query results contain BigDecimal or BigInteger rather than Double, Integer.

New: OCL.rawEvaluate supports return of BigDecimal or BigInteger bypassing coerceNumber.

Hidden changes:

oclAsType now works and is tested for Integer/Real conversions.
BigInteger and BigDecimal internal results should not throw exceptions
-* now works and is tested
Comment 10 Adolfo Sanchez-Barbudo Herrera CLA 2010-01-15 11:51:26 EST
Ed,

When trying apply the patch with in the ReflectiveLibrary branch I have a lot of problems. Fewer problems (just some merge problems in in a tests plugin) while applying against HEAD, so I guess that "latest CVS" = HEAD.

Looking into it.

Cheers,
Adolfo.
Comment 11 Ed Willink CLA 2010-01-15 12:33:52 EST
Yes. The patch is for HEAD.

I have just committed the change to the ReflectiveLibrary Branch which has now caught up with LPG2 etc. The Ecore tests run with about 55 failures; to be investigated. The UML tests do not yet run.

Switching to BigDecimal/BigInteger makes the revised approach much easier too. Take a peek at org.eclipse.ocl.evaluator.operations to see how modular it now is.

It would be good to have a preliminary approval that we want to change to BigInteger/BigDecimal always. Then perhaps the nitty gritty review can be a little slower.

The patch probably fixes 20 to 50 obscure problems, but may well creates nearly as many similarly obscure problems. It is only with the ReflectiveLibrary that we can really clear these problems. I expect to clear at least 15 bugzillas in one go.
Comment 12 Adolfo Sanchez-Barbudo Herrera CLA 2010-01-15 13:50:29 EST
Ed,

To be honest, I'm learning rather than doing a nice reviewing with this bugzilla, so my contribution on this is being simply reduced to catch some noise. Let's see if I can help ;).

About the approach. In principle, (and you know my position about MDT-OCL 3.0.0), I'm not worried about any inconvenience related to breaking API, or any backwards compatibility. Actually, we must take advantage to improve the implementation, and it seems that you are fixing a lot of numeric conversions related nuisances. So, 

1. I'm happy with the solution (my +1 to the approach)
2. I wouldn't include "setRealSymbol, setIntegerSymbol methods to provide some compatibibility for original signatures", since 
	- it partially provides compatibility.
	- I don't like the not generated code.
So if the API need to be broken for the expression symbol, let's learn clients to use the new one instead of including partial compatibility solutions. 

Apart from that, have a look to the following comments

- In the OCLCST.ecore model, the references to the BigInteger/Double are established via platform:/plugin/.... URI instead of http://www.eclipse.org/... . This unnecessary change leads to, for instance:
	- Load the same Ecore package twice (you can check the Sample core editor, which provides you two EBigIntegers
	- A new usedGenPackages entry in the genmodel.
	
I feel this may be fixed, and the code regenerated again.

- The same for OCL.ecore
- Number still remains in OCL.uml.
- What happened with the "user_doc" in the AST generated code?, the "This value may change when the model code is regenerated. It is subject to change without notice" comment, which I think has recently added, is not longer present.
- In ObjectUtil.equals method, why are we considering that Util method won't receive any Integer, Long or Double object (They have been removed), if you feel that they shouldn't be used along this util class, why not removing any consideration in other methods such as hashCode or isPrimitive ?
- I find a missing @since in ExpressionsPackageImpl.getUnlimitedNaturalLiteralExp_UnlimitedNaturalSymbol() method.

Have a nice weekend,
Adolfo.
Comment 13 Ed Willink CLA 2010-01-21 15:48:38 EST
Thanks for the review/observations:

The plugin references seem toi be a limitation of the UML codegen. I can't see any difference between EDouble and EBigDecimal in the UML or Ecore profile so ... ?

This code has evolved slightly to provide better compatibility in the ReflectiveBranch.

I'm no longer inclined to commit this for 3.0.0 if the ReflectiveLibrary doesn't also go in 3.0.0.
Comment 14 Alexander Igdalov CLA 2010-02-01 07:40:18 EST
(In reply to comment #13)
 
> I'm no longer inclined to commit this for 3.0.0 if the ReflectiveLibrary
> doesn't also go in 3.0.0.

I agree. Let's include it into 4.0.0.

-----------
Unlike Java (and other multipurpose languages) there is no variety of different-precision types for numbers in OCL. I have no clear opinion on whether we should we submit a request to OMG to add types representing Java's int, long and double to OCL. Perhaps, it is ok for OCL to be slower but less bloated?
Comment 15 Ed Willink CLA 2010-02-01 12:19:04 EST
The OCL evaluation exhibits significant control-code bloat so that the arithmetic overhead is not that bad. I judged, without measurement, that the costs of regular precision conversions in the current code were comparable and perhaps worse than the cost of always using BigXXX.

When a code Synthesis tool is used to produce efficient Java/C++/VHDL/... it will benefit from a numeric analysis to determine the numeric bounds of intermediate results and so allow synthesis of short or whatever.

OCL should perhaps allow OCL definition rather than augmentation of classes so that I can specify

def: MyShort conformsTo Integer
inv: -32768 <= self and self <= 32767
Comment 16 Ed Willink CLA 2011-02-01 02:35:13 EST
The new evaluator use BigInteger and BigDecimal throughout behind IntegerValue and RealValue.
Comment 17 Ed Willink CLA 2011-05-27 06:40:47 EDT
Resolved for Indigo is 3.1.0 not 3.2.0.
Comment 18 Ed Willink CLA 2012-05-01 16:29:02 EDT
Bug 366368 imptoves the legacy support so that Long selectively working is improved to probably fully working. In particular numeric and tuple literals may be Long. Arithmetic corners for Integer/Long are tested.
Comment 19 Ed Willink CLA 2012-05-29 13:21:02 EDT
Closing all bugs resolved in Indigo.