A Brief History of Software Operations
Operations Landscape
In this article, we'll explore and briefly survey the history of the operations landscape for managing software over the past two decades.
During the last several years containers became an essential component to how many modern applications are often built and delivered. The largest companies, smallest startups, and direct contributors including software engineers, data scientists, devops, or any group managing applications or models can all benefit from containerization.
Containers are often synonymous to “Docker containers”, because Docker first popularized container technology for mainstream usage. There are presently several alternatives to Docker or Docker tool replacements, such as Podman, containerd, and many others worth exploring.
The precursor to containers was the virtualization of a computer system called a virtual machine, which is still a common technology in use today.
Virtual Machines
A virtual machine is not based on container technology. Rather, a virtual machine contains the operating system and all of the dependencies needed to run an application. Basically everything within a computer system required to run an application including the user space and kernel space of an operating system. Server hardware is virtualized and each VM shares resources from the host. The benefits are similar to containers for spinning up new applications, minimizing security update complexity, and enhancing code portability. However, compared to containers the benefits of running virtual machines for modern applications fall short:
- Virtual machines take a non-trivial amount of time to start up because VMs boot a full OS.
- Virtual machines possess a much larger footprint than containers. Since a VM contains an entire operating system, the size of a VM can easily be several gigabytes. This makes VMs significantly slower to download and deploy across a distributed network.
- Virtual machines in addition to their size also require more overhead consuming additional compute resources, including CPU, memory, and disk space.
- Virtual machines possess challenges with scaling both vertically and horizontally. A new VM must be provisioned and booted in order to increase the compute resources (CPU, memory, storage, etc). Scaling out can be a slow process to start up a new VM even with automation and not fast enough to real-time demand.
Overall, virtual machines are much less efficient and performant than their container counterpart for building, deploying, and running an application, especially at a distributed scale.
History of Containers
To solve some of the challenges with managing applications, the software industry borrowed a clever idea from the shipping industry — the shipping container. A truck driver named Malcolm McLean in the 1950’s proposed an alternative strategy to unloading goods individually from truck trailers then separately loading onto ships. A truck body, or container, could instead be directly loaded onto the ship saving a massive amount of labor. Simple idea in hindsight, but the effect was transformative for the shipping industry.
By separating the container from the truck, the portability, reliability, and cost-efficiency significantly improved. The physical concept can also be applied to software containers abstractly. The same efficiencies at scale with shipping containers are realized with software containers. The primary contents of an application can be packaged, detached from an origin host, and delivered to any host destination with relative ease.
Initial conceptions of software containers dates back to the early 2000’s in the Linux operating system world. Originally containers were to partition an operating system enabling multiple applications to securely run without cross-interference. Isolation is implemented through namespaces and control groups, a feature of the Linux kernel. Namespaces allow the different components of an OS to be sliced up into isolated workspaces. Control groups allow fine-grained control of resource utilization, so containers can be managed independently and prevent a single container from consuming all system resources.
Interacting with the kernel features of the OS directly is onerous and not developer friendly. In turn, Linux containers (LXC) were eventually introduced to abstract away some of the complexities. Underpinning the various technologies to what is now commonly called a container.
Virtual Machines vs Containers
A virtual machine contains everything an application needs to run but with a lot of unnecessary excess. A virtual machine image can be ten to a hundred times larger than the size of an optimized container image due to the bloat in a VM. Since a VM contains many programs, libraries, and other artifacts unrelated to the application, a large portion of its space is wasted. In effect, this slows down the transferability of VMs across the network (e.g. during deployments or scaling automation).
Moreover, a virtual machine effectively implements an emulated CPU. VM performance can be significantly slower running an application than an equivalent container. On the other hand, containers run directly on the underlying host CPU with no virtualization overhead slowing down performance.
Virtual machines also demand more storage space. Whereas container images can be optimized to satisfy the barebones dependencies needed to run an application. In addition, container images use filesystem layers that allow for caching and composability with other container images.
For instance, the base image of a container, such as an official Python Docker image, only needs to be downloaded once on a build host and can be referenced by subsequent container image builds. When building an image, a layer will only be downloaded if it is not cached already. Improving disk space, network bandwidth, and build time.
Containers were a logical next step to improve upon the inefficiencies of virtual machines. Isolating individual applications into their own virtual environments by running containers within the same OS kernel. Containers are isolated at the file-system level, and as aforementioned, LXC was an early implementation of the technology. However, Linux containers were difficult to use and tightly coupled to the host OS. Then came a major breakthrough in containerization in 2013 with the introduction of Docker.
Docker Revolution
Docker first debuted to the public in 2013 at PyCon, the largest annual conference based around the Python programming language. Shortly thereafter Docker (written in Go) was released as open-source software on March 20, 2013. With the release of Docker to the public, containerization was suddenly mainstream and accessible to many developers.
Docker made containers mainstream to developers by introducing a developer-friendly packaging of kernel features. Docker is container based technology which uses OS-level virtualization to deliver software in packages called containers. Essentially containers are encapsulated, individual units of software that run as isolated instances on the same kernel with virtualization on the host operating system.
Containers share space in the same operating system (host OS kernel). At the low level, a container is a set of segregated processes running from a distinct image that provides all the necessary files to execute. Containers are isolated from one another and bundle their own internals (application environment, contents, dependencies, etc), like containers on a cargo ship. Unlike physical container objects on a cargo ship, Docker containers can communicate with each other. The virtualization of containers allows for pooled resources on a single operating system kernel utilizing less resources than virtual machines.
Moreover, the flexibility to build, ship, and run any application, anywhere confidently sets Docker apart from past solutions. Portability, flexibility, and reusability is key. Docker offers the assurance that running your application on your local computer within a container will execute exactly the same on a server, virtual machine, or Cloud platform, as long as the Docker Engine is supported.
Docker containers are consistent and repeatable for all run times and environments. The parity infrastructure of Docker always runs the same defined environment improving the reliability and deployability of an application to any location. The value of Docker can be best described by its methodology to ship, test, and deploy code quickly.
Software Management
Docker and container technology as a whole are a response on how to efficiently manage software applications. Creating reliable and reproducible deployments while running application services across distributed environments.
In order to deploy software, not only is the software application itself required, but also the dependencies needed to run the application. Standardized environments, libraries, interpreters, compilers, and so on. Additionally, application configuration may be needed, such as app settings, connection strings, passwords, and anything to make the application a usable service.
Chef, Puppet, and Ansible were a few of the notable attempts at solving the problem of software management. These services provided configuration management systems to install, run, configure, and update software shipments. Languages with their own packaging mechanism, such as Python wheels, are not sufficient to solve the problem of portability. In the case of Python, for instance, pip is still needed to install the package and a Python interpreter is needed to run the package.
Containers are the unit of deployment for an application in a single package. They are the unit of reuse, because container images can be used as a component of many different services. Containers are the unit of resource allocation for an application. They are the unit of scalability enabling multiple, independent instances of an application (container) to be run off a build of a single container image.
Maintaining dependencies and compatibility for an application on different computer systems or servers is challenging for small applications let alone at a large very scale. An application executed within a container only depends on a compatible container runtime on the host machine. Everything else to run the application is self-contained within the container.
Software containers and their orchestration systems, particularly Kubernetes, created a major shift in software management that enabled new platforms and an explosion of growth, reliability, portability, efficiency, and continuous delivery of software. Just like the shipping container generated for cargo and common goods in the mid 1900's.