"Bottlenecks" in software development
If you look at the deployment frequency of large companies, you will find huge differences. Where does the discrepancy come from? How to look for the "bottleneck" that hinders the software development process? And what are the possible solutions once the "bottleneck" has been found?
How often do leading IT companies deploy new software?
Why is it important for companies to bring new software into production frequently?
Doesn't the principle "never change a running system" apply?
To in the current VUCATo maintain competitiveness in the world, time to market is an extremely important factor. Who would like to lag behind the competition?
And that means that you have to deploy new functionality quickly, ie bring it into production successfully. But what does “deploy quickly” actually mean?
If you want the data on the web can trust, then the following companies are on one side of the spectrum:
- Amazon: 23.000 deployments per day
- Google: 5.500 deployments per day
- Facebook: 1 deployment per day
On the other hand, typical large companies in Germany – on the other side of the spectrum – have about 4 major releases per year, about one smaller release per month and then maybe a few hotfix releases that are thrown in in between.
This results in about 1 deployment in two weeks.
Where does this enormous discrepancy come from?
Are Amazon and Google taking bigger risks here or are they more careless with their releases? Rather no - both companies are just as dependent on error-free, permanently functioning software in production as the "typical large company" listed here. So please proceed carefully.
Wouldn't other companies also like to bring releases into production more frequently?
What's stopping her? Let's take a look at a typical value chain in software development and ask ourselves whether and where there might be a bottleneck.
Analysis of the software development process
- At the beginning of the chain is the department with its wishes/specifications and its significant involvement in the technical design (requirement definition - let's ignore agile development models here).
- The technical design and the technical implementation is the responsibility of the application development.
- The result of their work must be verified in the test - with the participation of both departments and application development.
- After a successful test, the software goes into deployment/release, usually by the company colleagues with the participation of application development.
Where are the showstoppers and bottlenecks that prevent more than 20 releases a year?
As a first step, let's take a close look at the department: it has more ideas than it has the implementation budget for.
Is that the factor slowing down the entire chain?
Let's ask ourselves what would happen if the department had more budget. Would everything go faster then?
Probably not. The desired implementations up to the release take a long time. Although the budget is a "bottleneck", it is not the key point that slows down the entire chain in the implementation.
(The department can also become a bottleneck if work results (e.g. analyses) take too long to be completed - but we don't want to look at that any further here).
Let's take a look at the next element of the value chain:
Application development has to take care of many software projects running in parallel. The resulting set-up times slow down development. And a developer can either do project work or solve a production problem - not both.
This step in the chain is clearly not running optimally, but application development could still provide more than actually goes into production in the end. We must keep looking.
How good or bad does it look on the test?
Despite numerous efforts to work on the basis of Test Driven Design (TDD) and to automate the tests, the test is still complex and mostly carried out manually with expert know-how. Similar to 'implementation', this is not the critical bottleneck.
Now let's look at the actual deployment and release execution at the end of the chain:
If a company works with formal releases and stabilization phases/"frozen zones" ("enterprise release", "monthly release", infrastructure release", etc.), then the number of releases limits what is in production for the department can be brought.
In addition to this organizational or procedural performance limit, there are often many manual steps that make the actual deployment process slow and error-prone. There is still a long way to go for operations teams to fully script deployments. Many tools now support deployment, but orchestrating them to work automatically at the click of a button is often an unaddressed task.
As long as a person, for example the person responsible in release management, still has to give his written OK to deployment based on submitted test reports, the deployment figures from Amazon or Google cannot be achieved. At this point, the lack of complete test automation also becomes a co-showstopper.
With the deployment step, we have found the central bottleneck in the value chain under consideration. If we can improve this step, we immediately optimize the entire chain.
But how can we achieve this optimization?
Where do these large, rare individual deployments (releases) come from?
They are often due to great complexity and a great need for security. The complexity results from the interaction of numerous software components, which is difficult to test with manual means.
The need for security results from the criticality of the applications for the company, which cannot afford error functions in production (e.g. bank transfers must be carried out without errors - the damage to the company's image or profitability would otherwise be immense).
But doesn't that also apply to the companies Amazon or Google mentioned at the beginning with the desired freedom from errors? What are these companies doing differently in their deployments?
The answer: they have fully automated their testing and their deployments. They've eliminated (in the "Infrastructure as Code" sense) all the manual steps that are slow and error-prone here.
They have removed the barriers between application development and operations team and leveraged synergy and collaboration on a DevOps team basis where previously work was “thrown over the fence”. The result is called "Continuous Integration" or "Continuous Deployment" (CD).
"That doesn't work for us"
I can already hear the voices saying: “Maybe Google can do that, but not with us.” Why not?
The technology is available to everyone. There are no technical reasons that would prevent a company from proceeding in a similar way to the big IT companies. Rather, an improvement in tests and deployments is often not pursued as a primary goal - and thus the possible benefit is never realized.
There are approaches to automating deployment, but they often fall short. Many tools are used, but not consistently through scripting and the like. connected to each other, so that again experts have to complete the deployment with manual intervention (example: assignment of passwords for technical users in production).
The situation is similar with test automation. The argument that is then often put forward is that maintaining the automated tests is as time-consuming as running the tests manually. That may even be the case.
But in one case I get the opportunity to check complex software for correct functionality at the push of a button - and to do this as often as I want, at any time of the day or night.
Otherwise I have to wait for the testers to find time for a test run. And then document that manually afterwards in the appropriate tools so that someone else can look at it and declare the test as "completed" and the software as "released". But are these really value-adding activities? No.
Then the path to increasing daily builds and deployments would be better.
How do I increase the number of deployments?
The key points on the way to a faster deployment - as a precursor to "continuous deployment' I have already mentioned before. There are:
- Building DevOps teams
- Transition to 100% automated testing
- Adaptation to "Infrastructure as Code" to automate the deployment work
The DevOps teams map the topic on an organizational level. It's a team working towards a common goal - there's no need to "throw over the fence".
The 100% automated testing enables fast testing at any time and with any frequency.
If the configuration of the infrastructure is viewed like a code, then the first step towards versioning, scripting, etc. is made.
This raises the question of which toolstack should be used to map the whole thing. This cannot be answered in general, but only in the context of your technical architecture and your software architecture. We at it-economics will be happy to advise you.
If many more deployments are then possible than before, the deployment frequency resulting from the measures mentioned can then be used, for example, as a measure of success (see Fig. DORA metrics).
Even more companies than those mentioned at the beginning are already very successful on this path (e.g. Netflix, Adobe or Sony Pictures Entertainment)