Monday, March 30, 2015

Microtyping in Java revisited

Kawaii is a small library aimed at exploring microtyping in Java - below I set out what microtyping is, the problems with using it in Java, approaches for minimising these problems, and possible ways forward.

What is microtyping?

"Microtyping", also known as "tiny typing", refers to a pattern in which code avoids using primitive classes such as String or Integer and instead uses "microtypes" which wrap a primitive value to confer a specific type identity to it, and is seen as the cure to primitive obsession. For example, instead of:



by using microtyping we can write:

Microtyping allows us to extend the type system to confer the advantages of typing on primitive values:
  • Making the code self-documenting.
  • Allowing the compiler to offer better compile time security and other tools to offer better inspection/refactoring support.
  • Allowing specific microtypes to define validity rules - eg an Integer microtype that allows only positive values.
but to be useful, this has to be convenient enough that the developer does not feel overburdened with additional boilerplate or difficulty integrating with existing libraries.

Why is this a problem in Java?

I have seen microtyping used from time to time, and it's been discussed before, but my experience microtyping is not common in Java. This is because, unlike some other languages (AdaD, Go etc), microtyping in Java is unwieldy:
  • Primitive types are final (for security amongst other reasons) and cannot be subtyped.
  • But type system offers no alternative to subtyping, ie no alias or typedef functionality.
And as far as I can see, neither new language functionality in Java 8 nor proposals for Java 9 improve on this, although I'd love to be proved wrong.

The only option this leaves is wrapping the primitive inside another class - but this can potentially involve a lot of boilerplate code when creating, and clumsy integration when using.

Boilerplate

Although Project Lombok reduces the boilerplate when creating value classes, and Java 8 add some interesting possibilities for value object creation, values objects are not microtypes:
  • A value object will typically wrap one or more primitive values or other value objects - but it won't act as it if it is a single primitive value.
  • A microtype will always wrap a single primitive value, and to some extent will be interoperable with it.
In other words, a microtype can be seen as special case of a value object, but with additional library integration.

Given this, Kawaii uses an abstract base wrapper class (MicroType.java) which is extended to create a hierarchy of microtypes - for example JSONString.java which extends MicroType<String> (JSONString is abstract, since we would expect other specific concrete JSON classes to extend this, for example PersonJSONString.java). 

The major boilerplate issue here is that due to Java inheritance rules, each subtype needs a constructor. Lombok could be used, but given that most IDE's will auto-generate the class and constructor, and once created it's not going to need to be looked at again, I don't feel this is actually a practical problem.

Integration

Since Kawaii wraps rather than extends primitives, it's not possible to use the microtypes as direct replacements for primitives when calling Java or 3rd party libraries. For example, given a microtype "Name" extending MicroType<String>:
  • Can't call "doSomething(String)" directly - have to unbox by calling "name.value()"
  • Can't infer method returns of String to Name - have to box by calling "new Name(value)"
This has the potential to pollute code, but by pushing the boxing/unboxing down as far as possible, and using adapters for external libraries, the issue can largely be avoided.

For instance, JacksonJSONTest.java demonstrates how:
  • ClassPathResources.java can be used to allow reading a file from the classpath into a MicroType<String>.
  • KawaiiObjectMapper.java can be used to read and write from a Java POJO that contains microtypes to a JSONString microtype, using the Jackson JSON processor. 
  • A custom Jackson serializer (MicroTypeSerializer.java) ensures that JSON produced when Jackson serialises a POJO containing microtypes is the same as if it had contained primitives. 
resulting in code which does not use the String primitive directly at all:


Note that in cases where a method returns a new microtype instance, it is necessary to pass in the class, so for instance instead of "writeValueAsString(Object)" we have "writeValueAs(Class<? extends JSONString>, Object)". New microtype instances are reflectively created in these cases using the static helper MicroTypes.java.

What next?

Firstly, some questions I do not intend to answer in this post:
  1. Is all this extra effort worth it? This is a subjective question and depends for instance on the size of the team/codebase, complexity, style etc - but for the purposes of this discussion I will assume there are at least some circumstances where it is.
  2. Why not just use language X which already supports microtypes? For some projects this may be an option, but choice of languages depends on many factors. I will assume there are at least some existing Java projects which will benefit, or new projects which will use Java for other reasons.

In my view, the major hurdle required to move forward is adopting a standard approach. Short of adding language-level support in Java, the next best would be to have a MicroType class in the JDK libraries, in the same way Optional was added in Java 8. This would allow 3rd-party libraries to add support, as is already happening for Optional. I could of course continue to expand Kawaii by adding adapters for other libraries (JAX-RS, JDBC, JAXB etc), but this will always be inferior to the libraries directly supporting a standardised MicroType.

Comments? Is microtyping an idea whose time has come in Java? Or it it's practical utility to small for it to ever catch on?

Thursday, March 26, 2015

For IT projects, almost all problems are scaling problems

Why is it that it's relatively easy to for a small team to write a system for a small user base, but it's so difficult to scale that to a long-term project with a big team and many users?

It has struck me how many issues in large IT development projects can be characterised as scaling problems - I find it's a useful mental short-cut to think in these terms, and to always attempt to minimise overall scale of a project. The resulting decisions are often subjective and trade-off's, but at least conscious ones.

Some aspects of scaling are:

  • Requirements: Few, well understood to many, poorly understood - How much does the system need to do to meet user requirements? How confident are we that the requirements are well understood?
  • Process: Lightweight to heavyweight - Are the processes in place around requirement gathering, development, testing and release simple enough not to unnecessarily impede development, but rigorous enough to maintain reliability?
  • Code: "Simple" to "complex" - How coherent, decoupled and understandable is the code? To what extent can developers have confidence in changes (through typing, tests etc)?
  • Technology: Few to many - How many technologies (programming languages, persistence mechanisms, messaging, tools etc) are used in the project?
  • Team: Small to large - Where on the spectrum does team composition lie? (small team (<10) located in the same room, large team (>10) located in the same office, individual developers located in several offices, several teams in several offices)
  • Data: Small, non-critical to large, critical: How large will the volume of data grow? How diverse it is? How critical is it?
  • Users: Few, uniform to many, diverseHow many users will there be? How will they be distributed geographically?
  • Time: Short term to long term - How long is the system likely to exist? What systems is this system aiming to replace? What other systems may eventually take over some functionality of this one?

Many of the initial decisions when starting a project should seek to balance these, but this is an ongoing process during the projects lifetime. Some typical examples are:

  • Adding a technology: This will increase the technological complexity - does it decrease one or more of the other aspects (typically code complexity, data size) to warrant consideration? Can we realistically entirely replace one of the existing technologies? Are we confident the new technology is sustainable in the organisation over the time-scale the project is expected to exist?
  • Setting up a new separate team: Do we have sufficient communication bandwidth to mitigate Conway's Law? Given the code and technology, how much training/pairing is required before the team is productive? Do we have procedures such as code review in place to maintain code coherence?
  • Onboarding a new group of users: Does this group of users have a different set of requirements that will affect the code size? If the requirements are sufficiently different, would an new separate system be better? What effect will this have on data size? Do we need to scale the system deployment? Do we need to co-locate developers or support teams with the users?

In summary, "scale" is a useful heuristic to consider making decisions about IT projects - by acting to minimise the effects of scaling, the project is more likely to be successful in the long term.

Thursday, March 19, 2015

Optional in Java: Why it can shield (but not save) you from null

With the introduction of the Optional type in Java 8, there has been a lot of discussion about it's use and it's relation to the null problem (anything can potentially be null, aka the billion dollar mistake). In particular, the view of some is that adding Optional to Java without fixing the null problem doesn't save you from null pointer exceptions, adds complexity without providing sufficient benefit, and that it would have been better to provide safe navigation operators instead - my views are set out below.

What Optional is, and when to use it

In plain English, Optional means exactly that: "optional". It's used to designate, via the type system:
  • An input parameter which is not necessarily required, or
  • A return value or field not guaranteed to be present.
Previously null (which is effectively outside the type system), would have been used for this purpose, but Optional has three advantages:
  • The type system describes what is required or optional without the need for additional API documentation.
  • Static compile-time enforcement via the type system that optionality has been catered for in the code.
  • Optional values can be cascaded to perform successive optional operations, which is particularly useful in functional style programming.
As an aside, another way of thinking of the Optional type is that it is a specialised collection which can contain only zero or one elements, and supports similar operations to collections/streams, plus some of it's own specialised operations. A method which returns a collection expresses optionality ("There may be zero or more results"), whereas Optional is more specific ("There may be zero or one result").

When not to use Optional

Or "How to avoid the Law of the instrument". For input parameters this is fairly clear - don't use Optional when they are in fact always required. But for method return values it's less so. 

On the one hand, returning Optional allows the caller to decide how to handle the missing case - but on the other, it means the caller always has to handle the missing case even if it's something that cannot realistically be recovered from. This is analogous to methods which throw checked exceptions when the caller cannot be expected to recover - in both cases the caller is likely to have to throw an unchecked exception, which adds noise to the code if this has to be repeated many times. I think the answer here is subjective, but the metric I personally use is "Are there inputs which the caller can provide which are valid, but could still produce no result?" - if yes, then the return should be Optional, if no, then the method should treat the invalid inputs as a bug and throw a runtime exception.

Another case where the use of Optional has be to considered carefully is where is acts against the fail-fast principles - by cascading optional operations we can defer failing to a later point, but sometimes it would be better to fail immediately, particularly if this means more useful error diagnostic information can be provided. Often the deciding factor here is whether the process is interactive (in which case it may be sensible to ask the user to take some action) vs batch (where it often makes more sense to fail as early as possible with as much detail as possible).

I intend to go into more detail on these cases in a future article.

Elvis is not Optional

Some languages provide operators to make null handling simpler, such as the Elvis and related safe navigation operators in Groovy.

Although these operators also address issues caused by the presence of null, they are fundamentally different from Optional:
  • Safe navigation operators are syntactic sugar, useful when writing code in a style in which null is an expected value.
  • Optional, on the other hand, is an extension of the type system, and should be used when writing code in a style in which null is not considered to be an expected value.
In a language which allows null, both can be useful - I believe adding safe navigation operators to Java is still being considered. However it would probably be a mistake to mix the two styles in a single code base (other than for interfaces to third-party code).

So how will Optional save you from null?

In theory, it won't. Since nothing fundamental has changed in Java 8 with relation to nullability, it's possible for variable of type Optional to be set to null, or even more insidiously set to an instance of Optional containing null.

But in practice we can ignore this: When writing code using Optional, we should never set anything to null or test for null. If a bug results in null being referenced, a null pointer exception will be thrown at runtime as usual - but hopefully the chances of this are reduced because the use of Optional in the type system will have forced us to design the code to cater for all cases of optionality.

The main problem with this approach is dealing with legacy or library code which does confer meaning to null, but it is usually possible to either refactor the code, or provide a facade.

The null situation itself in Java may improve in future - although it's extremely unlikely null can ever be removed from Java entirely, efforts such as the Checker Framework mean that static analysis of Java code to identify potential issues is increasingly useful.

Conclusion

When used in code written in a style in which null is never intentionally used, Optional can help express "optionality" in an elegant manner that the compiler can understand and enforce. However since the concept of null is so deeply embedded in the Java language, it's not likely to go away anytime soon - in practice this may matter less than you think.

Sunday, March 15, 2015

On verbosity in programming languages

Introduction

Over the years I've had many discussions on the "verbosity" of different programming languages - lately, these have mostly taken the form of: "Why should we use X, which requires pages of code to perform this task, when it can be done in a few lines of Y?", where X is typically a C++ family language such as Java, and Y is a scripting or functional language.

Inevitably this is a somewhat subjective question, and depends on factors other than verbosity, such as available skills in the team and organisation, expected lifespan of the code base, deployment environment, integration with external systems and so on - but I think it's still useful to set out what I see as some of the more interesting areas which directly affect verbosity.

Typed vs untyped languages

These are somewhat vague terms, but here I will define these as meaning:
  • Typed languages: Favour building type models to encapsulate data/concepts.
  • Untyped languages: Favour using primitive types (strings, numeric types) and collections (arrays, associative arrays) to contain data.
Typed languages are generally more verbose, and require more initial effort, but have important advantages:
  • Correctness: Types add a level of static error checking that are not present in untyped languages, which require additional unit tests to achieve the same level of confidence.
  • Readibility: The encapsulation provided by types makes maintenance of the code easier since it's clearer what the structure of inputs and outputs are.
  • "Refactorability": When used in conjunction with an IDE which understands the type system, typed languages offer far more scope for safe refactoring than untyped languages.
In general the smaller the code base and the shorter it's expected lifespan, the stronger the argument for an untyped language as the additional effort of setting up types may not be justified.

This is an area in which there has been much recent advancement, two of the most important being gradual typing (purporting to give the best of both worlds) and improved type inference (reducing the need to redundantly repeat type information in code).

Language syntax size

The size of a language's syntax is generally inversely proportional to verbosity - that is to say languages with more syntactic features are more expressive and therefore less verbose.

But a larger syntax size also tends to make code less readable, due to the use of obscure or confusing features - C++ is a prime example of this, with many style guides prohibiting or discouraging the use of some language features such as operator overloading and templates. Shell scripting languages also feature a confusingly large number of built in operators.

On the other hand, small syntax size tends to make the language feel clumsy - for example, until Java 8, Java lacked syntactic support for lambda expressions which made "functional in the small" style coding painful.

Coding style

Most programming languages tend to have a dominant style in which most code is written - this is a result either of either official style guide or community consensus. In some cases, this has evolved substantially over time both as new features have been added to the language and as community consensus has changed, for example one effect of the software craftsmanship movement was to emphasise the importance of naming.

Some areas touched by this that affect verbosity are:
  • Variable/method/class name length: ie one letter names vs descriptive names. Besides being a matter of style, this is also affected by IDE support (see Tooling below).
  • Brace/block style: Whether or not blocks are expected to always be explicitly defined, and whether the braces are expected to be on separate lines has a substantial effect on vertical size of code.
  • Operations per line: Languages in which compactness is seen as virtue tend to have styles where many operations are performed on the same line (Perl being particularly notorious), whereas those which favour readability tend to have one or two.
  • "Institutionalisation": By which I mean the degree to which the coding style is affected by the perceived needs of large institutions, for example the over-use of design patterns.

Tooling

Some languages are generally edited in a text editor, others in an IDE which has some level of understanding of the code. This affects what is considered to be acceptable verbosity, because the IDE will:
  • Hide some verbosity by eg offering structural views of the code.
  • Automate creation of boilerplate code.
  • Support name completion and refactoring of the code making it more practical to use long descriptive naming.
To some extent this has resulted in a backlash, with some developers feeling that an over-reliance on IDE's has resulted in unacceptably verbose code. Others feel that thinking of code as a text document is outdated and that it should be considered a data structure, inseparable from the IDE.

Conclusion

Verbosity is clearly not as simple a matter as "fewer lines are better" - at the very least we need to make a considered trade-off with readability and maintainability, but there important factors to be considered, such as tooling and the code style which will be used.

What do you think? Which languages strikes the best verbosity compromises? Or is the dependent on the problem being solved?

Friday, March 13, 2015

There can be only one

Highlander adds the missing "only" operation to Java.

A common programming task is to find a single match from a collection of possibilities, for example finding exactly one person by email address from a collection of people. This may sounds simple, but there are complexities in doing so correctly and succinctly.

A naive solution to the above problem might be written in Java 7 as:

However this code exhibits two related problems:
  • If there is no match, null is returned. This may result in a null pointer exception being thrown later in the code. (Note that if the requirement is that no match as a valid case, we should return Optional<Person>, discussed briefly later in this article).

  • If there are multiple matches, the first is returned - this will result in the wrong result being used later in the code.
  • In both cases it would be desirable to "fail fast", in other words throw an exception immediately when an assumption is violated. Updating the code to handle these two cases results in:

    We now have code that behaves correctly, but given how common this problem is, can we find a more compact way to write this? If we are able to use Java 8 we could consider Streams and lambda expressions:

    This is certainly more succinct, and the findFirst().get() line will throw a NoSuchElementException if there are no matches - but still incorrectly returns the first match if there are multiple matches. As it turns out, there is no built in construct in the Streams library that will return the only element with fail-fast behaviour if there are multiple elements.

    So how do we solve this problem? One answer to write a static generic utility method, perhaps named "only", that encapsulates the desired logic - but we will soon discover there are three different language constructs in Java we will want to support:
    • Arrays
    • Iterables (the more general case of Collections)
    • Streams
    Also, we would also want to cater for requirements that no match being a valid case by returning an Optional.

    Lastly, we would want to support both Java 7 and 8, and both the Guava Optional in Java 7 and the inbuilt Optional in Java 8.

    This is the problem Highlander solves - a micro-library that allows retrieval of the only or optional only element in arrays, iterables/collections and streams by providing static "only" methods that cover all these cases. Using this, we can solve the original problem correctly and succinctly - in Java 7 we might do this using the Guava filter method:

    and in Java 8 the code can be reduced to a single line without sacrificing readability or correctness by using the convenience form of "only" that accepts a collection and predicate lambda: