Sunday, March 15, 2015

On verbosity in programming languages

Introduction

Over the years I've had many discussions on the "verbosity" of different programming languages - lately, these have mostly taken the form of: "Why should we use X, which requires pages of code to perform this task, when it can be done in a few lines of Y?", where X is typically a C++ family language such as Java, and Y is a scripting or functional language.

Inevitably this is a somewhat subjective question, and depends on factors other than verbosity, such as available skills in the team and organisation, expected lifespan of the code base, deployment environment, integration with external systems and so on - but I think it's still useful to set out what I see as some of the more interesting areas which directly affect verbosity.

Typed vs untyped languages

These are somewhat vague terms, but here I will define these as meaning:
  • Typed languages: Favour building type models to encapsulate data/concepts.
  • Untyped languages: Favour using primitive types (strings, numeric types) and collections (arrays, associative arrays) to contain data.
Typed languages are generally more verbose, and require more initial effort, but have important advantages:
  • Correctness: Types add a level of static error checking that are not present in untyped languages, which require additional unit tests to achieve the same level of confidence.
  • Readibility: The encapsulation provided by types makes maintenance of the code easier since it's clearer what the structure of inputs and outputs are.
  • "Refactorability": When used in conjunction with an IDE which understands the type system, typed languages offer far more scope for safe refactoring than untyped languages.
In general the smaller the code base and the shorter it's expected lifespan, the stronger the argument for an untyped language as the additional effort of setting up types may not be justified.

This is an area in which there has been much recent advancement, two of the most important being gradual typing (purporting to give the best of both worlds) and improved type inference (reducing the need to redundantly repeat type information in code).

Language syntax size

The size of a language's syntax is generally inversely proportional to verbosity - that is to say languages with more syntactic features are more expressive and therefore less verbose.

But a larger syntax size also tends to make code less readable, due to the use of obscure or confusing features - C++ is a prime example of this, with many style guides prohibiting or discouraging the use of some language features such as operator overloading and templates. Shell scripting languages also feature a confusingly large number of built in operators.

On the other hand, small syntax size tends to make the language feel clumsy - for example, until Java 8, Java lacked syntactic support for lambda expressions which made "functional in the small" style coding painful.

Coding style

Most programming languages tend to have a dominant style in which most code is written - this is a result either of either official style guide or community consensus. In some cases, this has evolved substantially over time both as new features have been added to the language and as community consensus has changed, for example one effect of the software craftsmanship movement was to emphasise the importance of naming.

Some areas touched by this that affect verbosity are:
  • Variable/method/class name length: ie one letter names vs descriptive names. Besides being a matter of style, this is also affected by IDE support (see Tooling below).
  • Brace/block style: Whether or not blocks are expected to always be explicitly defined, and whether the braces are expected to be on separate lines has a substantial effect on vertical size of code.
  • Operations per line: Languages in which compactness is seen as virtue tend to have styles where many operations are performed on the same line (Perl being particularly notorious), whereas those which favour readability tend to have one or two.
  • "Institutionalisation": By which I mean the degree to which the coding style is affected by the perceived needs of large institutions, for example the over-use of design patterns.

Tooling

Some languages are generally edited in a text editor, others in an IDE which has some level of understanding of the code. This affects what is considered to be acceptable verbosity, because the IDE will:
  • Hide some verbosity by eg offering structural views of the code.
  • Automate creation of boilerplate code.
  • Support name completion and refactoring of the code making it more practical to use long descriptive naming.
To some extent this has resulted in a backlash, with some developers feeling that an over-reliance on IDE's has resulted in unacceptably verbose code. Others feel that thinking of code as a text document is outdated and that it should be considered a data structure, inseparable from the IDE.

Conclusion

Verbosity is clearly not as simple a matter as "fewer lines are better" - at the very least we need to make a considered trade-off with readability and maintainability, but there important factors to be considered, such as tooling and the code style which will be used.

What do you think? Which languages strikes the best verbosity compromises? Or is the dependent on the problem being solved?

6 comments:

  1. One of the considerations in 8th (8th-dev.com) was the free-form syntax associated with Forth-like languages. You have the choice to be as verbose or terse as you wish, without the language forcing you into its mold. That's my ideal language.

    ReplyDelete
    Replies
    1. I'm not sure if this applies to 8th, but a problem I have seen with many languages which have a large/flexible syntax (eg Perl or Scala) is that code written by one developer can look so different from code written by another that they might as well be different languages. This can be problematic on large-scale projects were it's desirable to maintain a common style across the team and over time, and also when trying to understand library code written by a third party.

      Delete
    2. That's true. It's a potential issue in 8th as well, because of the *lack* of syntax. Since the programmer may effectively use any naming convention (including a ridiculous one) it could get out of hand quickly. On the other hand, it's also possible to write more expressive and beautiful code (for some value of same; you mileage may vary of course).

      The solution for that is to enforce some conventions (like 'stick with names that are meaningful'); but that's not something unique to any particular language.

      Delete
    3. Agreed - as with most issues around verbosity, I think it's a trade-off: The larger the possible syntax size, the more expressive the language - but the greater the vigilance needed to ensure that the team is sticking to the agreed conventions.

      The "refactorability" factor also has some bearing here: If a codebase can be easily refactored, it makes it possible to bring code not following conventions into line much more easily.

      Delete
  2. Confusion and redundancy of languages is increasing with time, and they moved away from the modern form of the original goals set by their creators.

    ReplyDelete