Software Design: Defining Complexity When Discussing Technical Approaches
Introduction
It’s typical when starting work on larger features, a new system architecture, or a substantial technical initiative, to have some kind of discussion in the team to validate and share knowledge about the intended approach.
One approach that I think helps facilitate these discussions is the ‘Design Twice’ method, which advocates contrasting two radically different approaches to help explore the strengths and weaknesses of each approach.
Sometimes, an engineer assessing a proposal will provide feedback to the effect of “This approach seems pretty complicated”. You might expect that this is typically followed by a structured exploration of perspective around complexity and a deep-dive into the reasons, but I’ve found the outcomes vary in practice between
- Structured exploration of the complexity and consensus reached on suitability of approach ( Ideal )
- Unstructured exploration of complexity with mixed conclusions
- Impasse and frustration, with no common understanding of the complexity
I’ve been thinking a lot about how to guide engineering teams to better outcomes in this area. In part, it’s been beneficial to highlight instructional works that cover these themes in detail. I particularly like John Ousterhaut’s ‘A Philosophy of Software Design’ and ‘Software Engineering at Google’ by Titus Winters, Hyrum Wright and Tom Manshreck.
But I also wanted something empower engineers to participate without having steep prerequisites. People are tired after all…
So here are some simple, non-prescriptive, questions to help structure the exploration of complexity in your team.
1. How difficult is it to describe this approach to another engineer?
Even extremely talented engineers can fail to provide context, establish the key abstractions being affected by a change or highlight the strengths and weaknesses of an approach. If you find that a conversation has jumped immediately to code, and you’re watching an engineer highlight functions and explain their purpose, then it may be an indication that a step back is needed.
First, try to establish a broad statement of intent that summarizes the approach in an easy to digest manner. For example
We’re shortly going to be extending our reporting platform to introduce an export feature for our end users that will allow them to pick and choose dimensions and measures for a report. Our reporting api is limited in expressiveness, as it relies on a fixed set of dimensions and measures.
I would like to discuss reorganizing the API to be more flexible about what dimensions and measures can be accepted in a request. I need some help validating that this would better encapsulate the complexity of our report generation, for both our export and dashboard report use-cases.
From here, move on to establish the core abstractions that are required to model the problem, and highlight changes to the existing approach if it exists. Pay special attention to revisions along the lines of special-general mixtures, and how they affect the ‘layering’ of the application.
Once the core abstractions have been outlined, then move forward to illustrate how these abstractions communicate with each other to solve the business problem.
This is often enough to facilitate the discussion with other team members and maximize the potential for contribution. The team members who are more familiar with the problem may highlight specific, tactical problems that need to be discussed. The team members who are less familiar will likely provide feedback about the general communication pattern and modeling.
In cases where the discussion is still too abstract, the ‘resolution’ of the problem can be increased by white-boarding some of the interactions. This should be a natural evolution of ‘preparation’ for these kinds of discussion.
2. Are we using the right abstractions to solve this problem?
Often, it’s necessary to reflect on the complexity built over time in a system. This can be the result of a number of tactical decisions made, that are beginning to cause drag on the system. If the symptoms of this are becoming clearer:
- Change amplification — Changes must be made in several places to achieve the desired outcome.
- Obviousness — Multiple ‘gotchas’ involved in making a desired change that must be navigated cause high cognitive overhead.
- Overexposure — Unrelated concerns must be taken into account to make a desired change.
- Special / General Mixtures — Behavior that is specific to one set of user-requirements pollutes other general purpose capabilities.
Then it may be time to revisit the core abstractions use to model the problem. When discussing these abstractions, try to evaluate them based on their
- Width — Shallow or Wide ? Which is better ? Be conscious of over-exposing configurable values, vs providing sensible defaults.
- Depth — How much complexity does this abstraction hide?
- Purpose — Does this abstraction simplify a cohesive problem? Multiple problems?
If this is difficult, try to temporarily discard the existing services that physically ( as micro-services typically ) solve the problem today, and return to the domain boundaries as a way of evaluating the abstractions.
3. How difficult is it to change this approach ?
This question is aimed at facilitating a discussion on risk, along the lines of continuous experimentation principles. It is common in software engineering teams for there to be tension between:
- Capitalizing on existing investments made — Typically driven by the ‘proto-manager’ persona in a discussion, this advocates making small, incremental changes to an existing approach, and using the investments that we have already made.
- Exploring new approaches — Typically driven by the ‘engineer’ persona in a discussion, a curious mindset always seeks to find better ways to solve a problem, potentially discarding previous investments.
Often, the balance in this dynamic is found by finding meaningful ways to measure to benefits of exploring a new approach and by discussing risk.
Consider, when discussing an approach, how the blast radius of unforeseen consequences could be limited ? If using a new language, can it be limited to a small subsystem and evaluated over time ? Do we have a fallback mechanism if problems occur ( e.g. feature switches or canaries ) ? How quickly can we reduce the uncertainty around a problem and how ? Performance tests ? A/B tests ?
4. What dependencies does this create for the team ?
Build systems, code libraries, frameworks, languages, cloud-provided services, these all create dependencies and constraints for the team. Spending time deliberately discussing these dependencies can help to identify pitfalls that may manifest over longer timelines in a software system. When discussing dependencies, consider:
- Whether cloud services establish ‘version-locks’ on the oss-libraries, and whether this is acceptable for your team ( e.g. EMR and Spark )
- Licensing model for new dependencies, engineering support, community and velocity. Will any costs paid for a dependency scale as the system grows ? Is there enough support in the community and enough development happening on a core library ?
- How opinionated is a framework ? Frameworks that are overly prescriptive may not be suitable for systems which have monolithic properties. Discuss how the scope of a problem is likely to change over time, and whether the framework facilitates the intended growth.
Be conscious of the positive and negative freedoms when making decisions.
- Positive freedom — The freedom to do something, e.g. to choose the programming language you prefer
- Negative freedom — The freedom from things happening to you, e.g. the freedom from having to support additional programming languages (even if others would prefer to use them).
Remember the next generation of your team, who will have to own the decisions you make today, and be kind to them.
5. How are we affecting the cognitive load of the team?
Supporting autonomy and mastery in a team can be a question of providing well defined, correctly sized responsibilities and boundaries. Try to be conscious of this when making decisions that affect this boundaries and be mindful of overloading the team. It may be practical to rebuild a component in the team, vs a more limited off-the-shelf solution, but does it overburden the team in terms of maintenance and cognitive load ?
Here are some common ways that the cognitive load of a team is over-encumbered by tactical engineering decisions:
- Too much in-housing of software owned and operated by the team. E.g. ‘we rolled our own configuration service’ or ‘we rolled our own scheduler’. Know what’s a commodity, and what function only your team can perform.
- Too many different ways of achieving the same thing. For example using ECS & EKS at the same time.
- Proliferation of programming languages. Yes certain languages are better at specific problems, but each language takes time to master. Keep the number of languages in use appropriate to your team size.
This is in addition to the cognitive load overhead that stems from software design-related choices. Try to limit the number of ‘exploration’ activities that are in-flight at the same time, to avoid falling into prototype ‘bake-off’ with little value being generated ( note: this applies more to value-stream aligned teams, for enabling/facilitating teams this can make sense ).
Conclusion
Being deliberate in your discussions around complexity, and applying structure to these discussions, can help establish a design ritual that is satisfying, engaging and valuable.
These questions are proposed as non-prescriptive questions, there are certainly others that are valuable to ask, to help guide recurring discussions around architecture and approach. The goal is to continuously improve the quality of design discussions over time.
If these conversations are not common in your team, or you think they could benefit from fresh direction, why not try them at your next meeting? I would enjoy hearing feedback about what worked and what could be improved.