NB This document is a very rough draft, under active development. Suggestions and feedback welcome, but don’t take any of it as gospel!


At a high level, the current focus is on getting the project to alpha-grade functionality, where the primary goal is to enable interested members of the Aegir community to participate in further development. The Next Steps section below maps out a rough set of pieces that need to be tackled in order to get us to a Release Candidate. The first few steps will likely need to be covered in the order given, but later steps will need to be clarified as we go.

This document should be regularly updated as we iterate, to reflect current progress and status.

Developer Experience (DX)

One of the overarching design goals of Aegir5 is to enable a simpler developer experience than has been possible in Aegir3. We want to enable someone who wants to “deploy Wordpress on Aegir” to do so with minimal knowledge of its internals. One aspect of this is making clear boundaries in the form of APIs and a protocol definition for how the different components can interact.

Which Developers?

We see 3 distinct levels of “developer” that may interact with Aegir, and we aim to allow each of these roles to operate in a reasonably contained fashion:

0. Site Builder

  • Create new sites
  • Manage site workflows: backup/restore, update, migrate, clone, etc.

1. System Administrator / DevOps / Site Reliability Engineer (SRE)

  • Pulling custom tasks along with their entities and fields together into operations.
  • Custom tasks might run an Ansible playbook, a Terraform apply command, or similar, in order to achieve a certain coherent provisioning component (e.g. a server, a service, a codebase, a database, or a site).
  • Can configure Drupal’s robust Entity system, via its front end, to build out desired Operations (a collection of Tasks) to coordinate provisioning and management of Servers, Services, Codebases, and Types.

2. Backend Developer

  • Ordinarly there would be a “reference implementation”, a standard way to do X (e.g. deploy Drupal 8+, Wordpress, Hugo).
  • Admin can take that and extend it, to customize for their use case, incorporating other custom tasks:
    • Add a Redis or Solr service into the mix.
    • Require a Terraform aspect.
    • Incorporate S3 buckets for storage of the sites being hosted.
    • Enable a HTTPS certificate (LE, custom, self-signed).
  • Developer should be able to use an Ansible module (or similar) to ignore details of the communications protocol between the front and back ends internally, and just “send this data back to there, for this UUID” type of thing.

3. Maintainer

  • Developers of the Aegir system itself.
  • Ideally, the “plumbing” that we are responsible for becomes a minimal system to drive “the protocol” along with a Drupal application coordinating the operation, provisioning, and lifecycle management of the provisioned components.
  • Ultimately, this should give rise to alternative “client” or front-end implementations, with different focal points or feature sets. An Aegir set-up might be used in production and thus require ironclad blue-green deployment and locked down access to the environment, or it might be used to facilitate web application development workflows, spinning up per-branch environments automatically on the basis of whatever conditional trigger we can imagine. 😃

Next Steps

0. General Housekeeping

Following a two-year hiatus of active development, the Consensus team is undertaking a re-invigoration of the project and leading the charge to bootstrap us back to a working prototype.

To this end, there are a few initial tasks which are currently in progress:

  • Issue #60 - setup SSH connectivity between Lando containers to facilitate inter-component communications.
  • Issue #8
    • Refresh the CI pipeline and Behat test suite to get back to where it was.
    • Fix the existing test suite so it works again in the new multi-container set-up.
  • Issue #58 - fix @javascript tests in CI.
  • Issue #59 - create container images to simplify and speed up localdev environment creation.
  • Re-triage the issue board and make sprint-sized backlogs.
  • Review and update these docs, with an eye to enabling new developers and contributors to get started.
    • Flesh out Lando/Drumkit environment details and workflows.
    • Ensure high-level architecture docs reflect current reality.

1. Backend -> Frontend communication via Message Queue

Currently we have the Drupal front-end components requesting tasks be run on the back-end via the dispatcherd Celery/RabbitMQ message queue component, triggering Ansible tasks to be run on the server side. With the move to Lando for local development, we’ve more closely modeled the decoupled architecture of the front- and back-end components, where all communication should be managed through the message queue.

What we don’t yet have is the mechanism to pass data (and in particular, as a first/special case, log data from the Ansible playbooks being run) back to the front-end in order via the message queue.

See Issue #51 for details.

While the focus initially is on passing log information back to the front-end, this entails completing the end-to-end communication over the message queue, which opens the door to building higher-level functionality.

2. Pluggable Backends

In fairly short order we’d like to explore the idea of making the back-end tasks “pluggable” in the sense that Ansible playbooks is only one possibility. Currently the obvious second back-end type would be Terraform, but we’ll use this case as a basis to make Aegir flexible as to what kind of tasks to run in the backend (could be shell scripts, other config management or provisioning tools, etc.)

See Issue #52 for more details.

3. Build out the general purpose E2E API

With the previous two pieces in place, we can start to flesh out the protocol to enable arbitrary end-to-end round-trip communication between front-end and back-end.

The goal here is to provide a flexible but encapsulated mechanism to allow the front-end components (Operations, Tasks, codebases, Sites, etc) to call into a backend process to say “run this task”, providing some relevant parameters. In turn, the backend should than have a generic way to feed data back to the front-end about the status and details of the process it runs, without having to know too much about the specifics of the front-end per se.

We are currently envisioning this as an Ansible module which playbooks can call to say “give this data back” to the front-end task/operation which initiated it. This might be “the backup you requested now lives here” or “the site you requested has been provisioned”, for example. Obviously with pluggable backends this is likely to change, but the basic idea remains.

What this does for us is create a decoupled system of reliable but asynchronous communication between front and back-ends, which needn’t have direct HTTP access to each other. In turn this allows for the possibility of swapping in alternate front-end interfaces, or alternate back-end systems, provided they implement the protocol.

See Issue #42 for details.

4. Re-implement Aegir3 workflows

  • Focus on D8+ as the primary application to deploy, and aim for feature parity with the Ops-focused workflows that Aegir3 does very well for Drupal 7, 8, and 9 today.
    • Create Codebases (traditionally called “platforms” in Aegir), provisioning them on one or more Servers (later containers) which have the Services to run them.
    • Create Site instances on top of those Codebases, which represent a particular install at a specific URL.
    • Automated and manual backups of Sites and Codebases, to designated Server(s).
    • Reliable restoration of backup data, to spin up a copy of a Site at a point in time.
    • Migrate Site instances between Servers and Codebases (even Services, e.g. MySQL to MariaDB), with blue-green deploy, automatic backups and rollback.
    • HTTPS, different Webservers, etc.
  • Currently, if we verify a codebase, we scan for profiles, pick up .info files, post those back to the front-end. This creates profile “entity” on the frontend as an optional available to site-installs, etc.
    • This is application specific: D7 is different than D8+
    • In D8+ we will probably use the composer.lock in a similar way
    • In general there should be an abstract form of this “set of packages” a Codebase has installed, that arbitrary applications (Wordpress, Grav) can define in their own terms.

?. Documentation & Disciplined Refactoring

  • One goal: define DX APIs to allow Aegir community devs to plug-in.
    • Need to figure out the ideal level of docs to allow other Aegir community folks interested in contributing to do so.
  • Another goal is to think through some outstanding architectural and workflow questions, considering our current understanding and goals for Aegir overall, in some ways broader than ever. For example: the data we pass between the front-end and back-end, and vice versa. where do we store its metadata? How do we maintain state of the system overall?
    • Sometimes we have transient data (e.g. state of a task), and the front-end polls for it, generally.
    • Sometimes we have a blob of data (e.g. the set of packages associated with a codebase), relatively stable, but periodically needing refresh.
    • Sometimes we have a list of data (e.g. a task log in progress), where we append to a list of field values regularly, and want to update the page in real-time, if somebody’s watching.
    • So the quetsion is: how do we handle these different kinds of data? In what ways are they different? What’s the conceptual model for front-end and back-end to use these well?
  • Also architecture-wise: codebases and sites are currently entity types, that are bundleable. Currently, to create one we inherit from a template/base class; there’s no other behaviour.
    • We might want to add more separate entity types for other kinds of things, such as different behaviours for servers and services than for codebases and sites. (Maybe?)

Other Purpose(s):

  • Better document architecture
    • Initially from a relatively high-level, to give new contributors a conceptual framework of the project overall.
    • Later, API documentation auto-generated from code, and code-coverage reports published with the main docs site.
    • Consider graphical elements like class mappings, block diagrams etc.
  • Remove unnecessary “boilerplate” in the codebase- the Entity system (and the rest of Drupal) have matured immensely since the first Aegir5 prototype was built, and we may be able to trim down a lot of the “custom” entity-type code where the functionality has just been incorporated into core, or similar.
  • Code coverage via BDD: The target is 100%; we need to get this back up.
    • New features should meet that standard. 😃
    • Resolve weirdness with Behat’s “code coverage” plugin.

?. D9+ Readiness

  • Ensure that the front end is running Drupal 9, 10, or the latest stable major version.
    • This is longer term, but let’s be aggressive.

?. Distribution and Packaging

  • Figure out what packaging looks like. However it gets deployed should be automated as much as possible with any .deb, .rpm, Snap/Flatpak etc. for distribution. Must be automated on release. Eliminate the current day-long release cycle.
    • Currently this is painful: multiple projects, specific order, build scripts, and tests that can break things.
    • Our pipeline: Tag it, forget it, and then the release is out. Then we can do minor point releases regularly, fearlessly.
    • Ideally we’d be doing this by the time we tag RCs.