Porting from Udanax-Gold

In porting Udanax-Gold from Smalltalk to the Java based Abora-White a number of issues have come up. For example Smalltalk integers are efficiently encoded as object references within 30 bits or so of size and will transparently expand to large integer objects beyond that size, whereas Java int's will not expand beyond their max values, and BigIntegers are second-class citizens at the syntax level. So what should the Java conversion do? The purpose of this page is to document these problems and the final outcome I have chosen for Abora-White.

BooleanVar

This was simply replaced with the Java boolean primitive data type.

IntegerVar

The Smalltalk native support for Integers is very powerful. For small numbers (< 29(ish) bits) a compact unboxed in-representation form is used to minimise performance overhead. If an integer larger than the initial range is needed then the system automatically changes to a boxed Integer object format - transparently to the application user. The boxed and unboxed versions provide the same protocol, and both support C-like operators such as +. The only weirdness is that there is no operator precedence rules, other than computations start from the left.

These are intended as 32-bit integers. In the server we were not anticipating using infinite precision Integers.
Note that the translator to X++ did manage the transition to C++ precedence correctly.

The X++ version of Udanax-Gold seems to support just 32-bit integers, which is a surprise. I guess this is either a misunderstanding on my part, or perhaps unlimited precision support for a pending feature? C++ allows automatic conversions between int and IntegerVar class representations.

The 'Var' suffix meant allocated on the stack as opposed to the heap. IntegerVar was defined so that we could control alignment, operators, additional type checking, etc., but can be thought of as a 32 bit int.

The Java native support for integers is split between primitive data types of various bit sizes, and the BigInteger class which enables unlimited precision integers. There is no automatic promotion from primitives to BigInteger. Additionally arithmetic operators are supported for the primitive data types, but not for BigIntegers.

I recommend mapping to int in Java.

Assuming Udanax-Gold IntegerVars are required to support unlimited procession, I have temporarily gone for a wrapped version of BigInteger. The downsides are performance hits for small and large numbers, and no arithmetic operator support. The worst of both worlds!

Java Primitive Data Types

The Java primitive data types don't completely match those of X++.

All Java primitives are signed, except for the char type, this contrasts to X++ where there is full control between signed and unsigned. This is particularly significant for the UInt32 type which is often used.

Mostly UInt32 was used for indices, for which negative numbers never made sense. They were also used for hashes. These could almost all be translated as int and it would be fine.

The second case has to do with Strings; Java string characters are unsigned 16-bit Unicode, default X++ string characters are unsigned 8-bit.

Exceptions

Udanax-Gold makes extensive use of Exceptions. I believe that on the Smalltalk side, the instance based exception mechanism of the Smalltalk language is used. On the X++ side I think a custom exception mechanism was implemented - possibly done before C++ exceptions were added to the C++ specification.

I'm pretty sure they map easily to instances of type-based exceptions, so the mapping to Java should be straightforward.

Java supports a reasonable class based exception mechanism. A facility beyond the Udanax-Gold exceptions is the support for declaring the types of exceptions thrown by a method.

The initial take is to follow the instance style, and I have added a org.abora.white.exception.AboraRuntimeException, together with a whole bunch of constants extracted from the exception use in Udanax-Gold. As the name suggests, this is implemented as a Java RuntimeException so that it does not have to be included in a methods definition. It is too early to try and properly define the relevant throws for each method.

The pattern for throwing one of these exceptions is like the following:
throw new AboraRuntimeException(AboraRuntimeException.NULL_INSERTION);

Over time I would like to move to a class based approach, and additionally make as much use of the Java built-in exceptions where possible.

Using Steppers

When iterating with a Stepper, the Smalltalk version of UG naturally uses Smalltalk BlockClosures to pass in the code that will be executed for every step. See the #forEach and #forPositions method for example implementation.

	{void} forEach: fn {BlockClosure} 
		[| elem {Heaper} |
		[(elem _ self fetch) ~~ NULL]
			whileTrue:
				[fn value: elem.
				self step]]
			valueNowOrOnUnwindDo: [self destroy]!

The X++ translation does not support BlockClosures, so instead makes use of the preprocessor define to macros to expand FOR_EACH, or whatever, out to the contents of the forEach method for each use of it. So there is no forEach method in the X++ source.

Java has a comparable feature to Smalltalks BlockClosure and so could implement and use forEach methods. This would typically be accomplished through the use of Anonymous inner classes implementing in this case something like a Niladic/Monadic/DuoadicValuable interfaces as needed. Unfortunately this technique is often frowned upon in the Java community as not matching the standard Java programming style, plus the implementation overhead of an extra (hidden) class for each use or call.

Java has no built-in preprocessor support, so the behind-the-covers code expansion of the FOR_EACH macro is not possible.

Initially I have decided, again, to follow the worst possible route and effectively hand add the contents of the forEach method to each use of it. I will review this approach later in the project.

As an example of the Java code for each equivalent forEach call:

	TableStepper stepper = myArray.stepper();
		try {
			Heaper e;
			while ((e = (Heaper) stepper.fetch()) != null) {
				newArray.atIntStore((myDsp.ofInt(stepper.index())), e);
				stepper.step();
			}
		} finally {
			stepper.destroy();
		}

As I recall we tried to make sure that it could be translated as a for-loop. That would be a little more readable.

...and the original Smalltalk source:

	(s _ myArray stepper) forEach: [ :e {Heaper} |
		newArray atInt: (myDsp ofInt: s index) store: e].

Object Birth

Object creation is usually started by calling a static factory method, often named make. This can be mapped generally to Java. The only complications are that a method defined in a subclass whose signature matches that in a superclass can not change the return type of the super definition.

Yes. Exposing the allocation behavior of a class when you want an instance of it prevents opportunities for caching, refactoring, etc. so we tried to use factory methods consistently.
Yup, they definitely will need some renaming for Java.

I have used the Java constructors in place of the create methods. This generally works out, except that Constructors have a number of 'strange' properties in Java. Constructors are not inherited, so often I have had to add intervening constructor definitions which simply call a super constructor. Following the general factor pattern, the constructors are made protected.

Exactly the right thing. Part of the reason for the generic naming of create methods is that they were supposed to map to constructors.

Some open issues in this area are with the area of the new.Become, and possibly similar methods. Java has not capability that matches the Smalltalk become feature. The become feature is mapped into two different ways depending on the Smalltalk implementation, either one-way or two-way become. A one-way become will change all objects referencing object A to now reference object B. The two-way become will change all object referencing object A to now reference object B and for all objects referencing object B to now reference object A.

We did not use two-way become except to emulate behavior we could accomplish in C++ some other way. We did use one-way become for converting stubs into objects during unmarshalling. In Java, you would need to use Proxies and maintain the layer of forwarding, I think (or not use lazy unmarshalling).

Object Death

Both Smalltalk and Java have Garbage Collection built in. Java additionally supports Weak references and finalization support. Newer Smalltalk contain similar features, and it is assumed that the Udanax-Gold Smalltalk included these as well.

We used the post-mortem finalization support from ObjectWorks. That's where it was invented, after all :-) For various reasons, WeakArrays are much better than individual weak references, but you can still do everything with individual weak references.

C++ doesn't support Garbage Collection in the default implementation, and it is assumed that X++ included some kind of Garbage Collection system.

Yes. That code should be available somewhere. It was a pretty impressive GC system.

To be researched; is destruct/destroyed only relevant to X++ or does it make sense for Smalltalk and Java implementation, distributed and file based garbage collection support.

My recollection is that it was only for X++. We might have used the distinction elsewhere, but we tried not to. One of the last things we did was move the X++ GC to use post-mortem finalization (and not destructors).

Equality and Hashing

To be researched.

Casting

The castInto mechanism is quite extensively used. For example:

	other
		cast: AndFilter into: [:af |
			^af subFilters isEqual: self subFilters]
		others: [^false].

This is quite simply hand translated into the following using standard Java instanceof operator.

	if (other instanceof AndFilter) {
		AndFilter af = (AndFilter) other;
		return af.subFilters().isEqual(subFilters());
	} else {
		return false;
	}

One case to watch out for is if no others section is defined. Reading the X++ notes for the comparable BEGIN_CHOOSE/BEGIN_KIND/etc feature one sees that in this case an exception should be thrown, rather than quietly falling through. See choosex.hxx.

Class Instance Variables

Smalltalk supports both Class Variables and Class Instance Variables. Class Variables effectively match the static fields common in C++ and Java. Such a variable is declared in one class, and a single value is associated with it that can be shared by all subclasses. Class Instance Variables are again declared once in association with a class, but the difference is that every subclass has its own value bound to the variable.

Java does not natively support class instance variables, but it is relatively straightforward to implement by following the X++ approach. Declare and define a private static variable in the original host of the class instance variable and all its subclasses. The static modifier will force each class to have its own variable and code will bind to the local classes version, and the private access will ensure that code cant use a superclass version instead. The only downside is the duplicate definitions when it comes to code maintenance.

Category

Udanax-Gold appears to use instances of its Category class to support runtime class information. Smalltalk and Java both include built-in class objects that are available in a deployed runtime application. I assume that C++ of the time didn't, together with alternative client languages, which prompted the creation of the Category class.

Category was part of Smalltalk. We stuffed information there that would normally be inline in a Java or C++ file (like instance variable types). In java, there should be no need for them.

For the moment the Java Class objects are being used in place of Categories. In the future the Category class my need to be re-added as more is learnt about its use.

See: Category API

Potential Bugs

During the course of porting from Udanax-Gold a number of potential bugs have been found. This happened by either close reading of the code or by creating JUnit tests for the code. I felt it was useful to list my discoveries here; to help me later when I might run into problems with code relying on this, help others who may attempt a port of Udanax-Gold to other systems, and for the active readers amongst you to double check my claims and hopefully show me that I was wrong.

I have listed these items as potential bugs as I could well be mistaken about the original purpose of the code, overlooking details of the Smalltalk/X++ low level implementation or quite possible gross user incompetence on my side.

As a closing statement, I hope nobody takes offence at producing such a list, I clearly have great respect for Udanax-Gold to even consider spending this amount of time on it, and we all know that no reasonable application can be defect free.

  • PrimIntegerArray:indexPastInteger - when nth is < 0, wont find match on index=0 (should have result >= 0)
  • Pair:isEqual - throws exception if you use obsolete pairs with alternative null values: (Pair.pairWithNulls(aHeaper, null)).isEqual(Pair.pairWithNulls(null, aHeaper))
  • MuSet & ImmuSet use incompatible contentsHash mechanisms. ActualHashSet adds together element hashes, whereas ImmuSets (inherited from ScruSet) bitXor together element hashes. I assume this is an oversight rather than a designed, but undocumented, mechanism to force Mu/ImmuSet with the same elements to produce different contentHashes.
  • IntegerRegion:below was actually a duplicate of the above method.
    The scary thing is that I remember this bug, so your code may be a slightly-earlier-than-final snapshot.