Thursday, March 26, 2015

For IT projects, almost all problems are scaling problems

Why is it that it's relatively easy to for a small team to write a system for a small user base, but it's so difficult to scale that to a long-term project with a big team and many users?

It has struck me how many issues in large IT development projects can be characterised as scaling problems - I find it's a useful mental short-cut to think in these terms, and to always attempt to minimise overall scale of a project. The resulting decisions are often subjective and trade-off's, but at least conscious ones.

Some aspects of scaling are:

  • Requirements: Few, well understood to many, poorly understood - How much does the system need to do to meet user requirements? How confident are we that the requirements are well understood?
  • Process: Lightweight to heavyweight - Are the processes in place around requirement gathering, development, testing and release simple enough not to unnecessarily impede development, but rigorous enough to maintain reliability?
  • Code: "Simple" to "complex" - How coherent, decoupled and understandable is the code? To what extent can developers have confidence in changes (through typing, tests etc)?
  • Technology: Few to many - How many technologies (programming languages, persistence mechanisms, messaging, tools etc) are used in the project?
  • Team: Small to large - Where on the spectrum does team composition lie? (small team (<10) located in the same room, large team (>10) located in the same office, individual developers located in several offices, several teams in several offices)
  • Data: Small, non-critical to large, critical: How large will the volume of data grow? How diverse it is? How critical is it?
  • Users: Few, uniform to many, diverseHow many users will there be? How will they be distributed geographically?
  • Time: Short term to long term - How long is the system likely to exist? What systems is this system aiming to replace? What other systems may eventually take over some functionality of this one?

Many of the initial decisions when starting a project should seek to balance these, but this is an ongoing process during the projects lifetime. Some typical examples are:

  • Adding a technology: This will increase the technological complexity - does it decrease one or more of the other aspects (typically code complexity, data size) to warrant consideration? Can we realistically entirely replace one of the existing technologies? Are we confident the new technology is sustainable in the organisation over the time-scale the project is expected to exist?
  • Setting up a new separate team: Do we have sufficient communication bandwidth to mitigate Conway's Law? Given the code and technology, how much training/pairing is required before the team is productive? Do we have procedures such as code review in place to maintain code coherence?
  • Onboarding a new group of users: Does this group of users have a different set of requirements that will affect the code size? If the requirements are sufficiently different, would an new separate system be better? What effect will this have on data size? Do we need to scale the system deployment? Do we need to co-locate developers or support teams with the users?

In summary, "scale" is a useful heuristic to consider making decisions about IT projects - by acting to minimise the effects of scaling, the project is more likely to be successful in the long term.

1 comment: