Upgrades of the tool itself, or any prerequisites (like the kernel or libc) are tricky in any automated management tool. ISconf is specifically designed to make these upgrades possible and reproducible (see http://www.infrastructures.org/papers/turing/turing.html), but they still require some attention to detail. On this page, we describe some of those details, and at the bottom we provide proposals for how the process might be made even easier.
Your input is welcome; feel free to edit this page (see LoginHelp if you aren't logged in already).
Version number format
ISconf versions follow the following format:
isconf-X.Y.Z.N
| X | major release -- significantly new user interface and/or usage |
| Y | protocol version -- increment only when wire or cache file formats change |
| Z | release status -- odd is testing, even is stable |
| N | changeset -- SVN changeset number |
See the Roadmap for details of a particular release.
Version compatibility
If you have an existing infrastructure running an old version of isconf (for instance, 4.2), and you upgrade that infrastructure to a version which uses a new wire protocol or cache file format (e.g. 4.3), you will tend to transition through the following phases (from A thru E going forward in time):
| phase | A | B | C | D | E |
| hosts running 4.2 | all | some | some | some | none |
| hosts running 4.3 | none | some | some | some | all |
| install images using 4.2 | all | all | some | none | none |
| install images using 4.3 | none | none | some | all | all |
| still need 4.2 peer(s) on net | yes | yes | yes | yes | no |
Note that, as long as any of your install images are still using the older version (phases A thru C), you need to assume that some of your hosts still run the older version, because that's what is going to happen when you install a new host. To move from phase C to D, you'll need to take a checkpoint image snapshot to upgrade each of your images.
Likewise, even after you've moved to phase D by upgrading all of your install images, you are still likely to have some 4.2 hosts hidden in corners, temporarily powered off, and so on. You'll need to ferret these out and get them upgraded to move to phase E.
To help with this transition, it would be nice to be able to leave the 4.2 network service daemons running on one or more hosts, even after they've been upgraded to 4.3. We used to do this in 4.1, and it worked for the 4.1->4.2 transition as well; 4.2.6.168 doesn't support it though.
Starting in 4.2.X we went to microtasks (see FlowBasedProgramming) instead of separate daemons for the different isconf services. So what we want to do is retain backward compatibility with older wire protocols by running a microtask (rather than a daemon) which supports the old protocol.
To deal with file format changes, the file storage area (under /var/is) which each microtask refers to on local disk also needs to be segregated by version (e.g. /var/is/4.2 and /var/is/4.3), so that old microtasks can serve old files etc. This is also a technique which worked well in 4.1. Here's one nice thing: Because of the way ISconf's journal replay works, we never need to serve old files from the new daemon or vice versa.
As part of the upgrade, we also need to create a lock on the old journal, never to be removed (to ensure that all new transactions are entered from an upgraded machine, and go in the new journal). (This might already be the default behavior now, if we do the /var/is/{version} thing.)
To keep port numbers from proliferating, we need to have a dispatcher listen on the TCP port, routing the incoming traffic to the correct microtask. The alternative, which we don't want, would be to assign a new port number for each new protocol version. We still might have to do that occasionally, but only when we change the dispatch protocol itself; this might be a once-every-few-years event rather than once-each-protocol-upgrade. This dispatcher code existed early in 4.2 development; it just needs to be ported forward.
Once a given site is confident that their infrastructure is in phase E above, they will want to be able to deactivate the microtasks serving the old protocol. An environment variable in /etc/is/main.cf should do the trick.
Proposal A
So, to summarize, in 4.2.6 we have to purposely leave at least one backlevel machine running while we upgrade disk images and so on. That's bad, and here's what we need to do to restore the functionality we had in 4.1:
- split /var/is up into /var/is/{version} (do we need to do this in /etc/is as well?)
- have each protocol version and /var/is/{version} tree served by its own microtask
- ensure permanent locking of any volumes which are managed by obsolete daemons
- have a dispatcher which routes the traffic to the right microtask
- provide a way to enable/disable protocol versions from environment
This work is tracked in ticket #57.
Proposal B
An alternative would be to always use a new port number, and disable or feed a bogus pathname to the UNIX domain socket code, and just run the old daemon as a standalone process. We don't need to talk to the old daemon at all (except maybe for monitoring). Advantages include simplicity of code; disadvantages include complexity of management. Right now I favor A instead.
Proposal C
Overall, this is a barrier-type problem, where we need to manage state of an entire infrastructure as a whole, and not advance to the next state until all machines are "at the barrier". In this case, the barrier is e.g. a 4.2->4.3 isconf upgrade -- we need to preserve a means of serving backlevel machines until we're sure that there are no backlevel machines left.
The "right" way to manage this is probably via whatever mechanism isconf will use to handle internode barrier problems, such as those we run into when building HA clusters. In 2.X we used a standalone barrierd to keep builds synced between nodes; in 4.X we are likely to want to add a wire protocol primitive to the existing architecture.
We might benefit from a "site journal" to help manage sitewide state -- and this might be part of the answer for environmental configuration files as well. See EnvConf.
