At-Least-Once Message Processing
On at least one occasion I’ve had to explain why at-least-once is the preferred if, not only, choice of message processing for most scenarios. I distinguish message processing from message delivery on the assumption that most message delivery infrastructures (such as Kafka, ActiveMQ and RabbitMQ) will continue to deliver a message on failure/restart until it is acknowledged.
Lets assume that the processing may involve a mix of calling web services, persisting to database, persisting to NoSql data stores and publishing further outbound messages to downstream systems before the inbound message is acknowledged.
To achieve exactly-once processing all of the systems used would need enrolled into a distributed two-phase transaction that is committed in one go. I’ll leave to one side the discussion of the technical problems with achieving this. Usually this approach is simply not an option as while most systems will support an atomic workload they all do not support distributed transaction as described above.
Achieving at-most-once processing is simple as one simply acknowledges the message before starting any of the processing. If the process is terminated in an abrupt way the collective system will be left in an inconsistent state as only the systems that were interacted with before the termination will updated. In finance and other areas where consistency is essential this also is not an option.
Ingredients of Successful At-Least-Once Message Processing
With the alternatives eliminated, what is needed for at-least-once processing? When a message is re-applied the overall state of the system needs to remain the same as when the initial message was applied. On subsequent message application:
- Data stores should not change.
- Downstream messaging or interaction should be done with the same message content (which will ultimately have no effect provided given all systems are following at-least-once message processing).
The following help make this possible.
unique identifier: In order to detect if the message has been processed before, each message must have a unique identifier. This identifier will usually need to be used by or passed to every system interacted with. The identifier will ideally be generated when the processing is initiated, as random identifier generation is not a consistently repeatable operation. New unique identifiers can be generated based on inbound message characteristics, which obviously, is consistently repeatable.
repeatable action verb: It can help if the message action verb is something that makes sense if repeated so favour ‘refresh’ or ‘upsert’ verb over ‘insert’ and ‘update’ and remember that a ‘delete’ is successful even if no row is actually deleted.
Top Up Example
Consider a mobile phone top-up processor the maintains a credit balance. The inbound message is to add £10 to an account, the unique identifier included in the message is 2363. We need to be able to identify if the top up has happened so we have a top-up transaction log and call the message a TopUpTransaction. The processing is as follows.
If entry 2363 does not exist in the transaction log:
- Increase the balance by £10
- Add 2363 to the transaction log
On first application the balance is increased by £10, on second application the balance is unchanged.