Canary Deployment for Queue Workers

Deployment of Microservices

A very well understood and practiced aspect in SDLC is the deployment of your code or deployment of your service. The world of microservices mostly follows rolling deployments, where the new version of the code is gradually rolled out to all instances of the service without requiring downtime.

Canary Deployments

Canary deployment is a pattern for rolling out releases to a subset of users or servers. The idea is to first deploy the change to a small subset of servers, test it, and then roll the change out to the rest of the servers. The canary deployment serves as an early warning indicator with less impact on downtime: if the canary deployment fails, the rest of the servers aren’t impacted.

  1. Test, or wait until satisfied.
  2. Deploy to the remaining servers.

Canary Deployments with Web Nodes

The canary deployment model with nodes or pods serving web or HTTP/gRPC traffic is quite straight forward and it includes the following.

  1. A predefined and fixed number of baseline web nodes: These nodes have older versions of the code running and they are used for comparing metrics with the canary nodes.
  2. Regular nodes: They serve the majority of the traffic for the service and can be scaled up or down.

Canary Deployments with Queue Worker Nodes

Canary deployment for an application that makes use of async processing through queues and queue workers becomes a little complex. Before we go into canary deployment for queue workers, let's discuss a few points

  1. Deploy the worker nodes first, and then the web nodes.
  1. Compare the canary metrics of canary worker node with that of baseline worker node.
  2. Proceed ahead if canary analysis passes, else revert the canary worker node.
  3. Deploy the canary web node.
  4. Compare canary metrics of web and worker canary nodes with that of baseline nodes.
  5. Proceed if all fine, else revert the canary deployment.
  6. Deploy the rest of the worker nodes.
  7. Deploy the rest of the web nodes.
  1. It also helps in testing backward compatibility with 2 phase canary analysis.
  1. Additional logic on web nodes to selectively push messages to a canary queue. This becomes more complex when you have multiple queues for various use cases.
  1. Compare the canary metrics of canary worker node with that of baseline worker node.
  2. Proceed ahead if canary analysis passes, else revert the canary worker node.
  3. Deploy a new version of code to all the worker nodes if canary analysis passes, else revert the canary worker node.
  4. Deploy the canary web node.
  5. Compare the canary metrics of canary web node with that of the baseline web node.
  6. Deploy the new version of code to all web nodes if canary analysis passes, else revert the canary web node.
  1. Simplistic setup with no additional and dedicated canary queues.

Summary

You can choose any of the above mentioned canary deployment strategies based on your requirements and needs. The dedicated queue approach is good for testing both the happy flow and the backward compatibility while the common queue approach is only good for testing backward compatibility in consuming messages. Testing backward compatibility is important as the deployment happens in a rolling fashion and a queue can always have residual messages from the older code. The shared queue approach is preferable when you need simplicity in your code/infrastructure and you are confident that the happy flow is well tested through your functional/integration tests. The dedicated queue approach would test all the scenarios but comes with its own complexity.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store