Production Notes #02 · Binary and Beyond
The first integration always sounds simple.
Connect the commerce platform to the ERP.
Push orders when they are paid.
Sync inventory overnight.
Update the CRM when a customer registers.
On a diagram, this is a line between two boxes. Maybe an arrow. Sometimes a label: REST API.
Nobody calls it a distributed system.
They call it an integration.
When the line between two boxes becomes a system
Imagine a team that has just signed off on a straightforward brief.
"When an order is placed, send it to the warehouse system."
The first version ships in a sprint. A webhook listener receives the payload, maps a few fields, calls the warehouse API, and returns success. Tests pass. Demo looks clean. Everyone is relieved.
Then production happens.
The warehouse API is slow on Monday mornings.
The commerce platform retries webhooks when the first attempt times out.
A product manager manually refunds an order in the admin panel while the original webhook is still in flight.
The warehouse receives two fulfilment requests for the same order.
Meanwhile, the ERP sync job — added in phase two because finance needed it — runs every fifteen minutes and has not seen the order yet.
Customer support opens a ticket: "The customer was charged but the warehouse says they have no record."
Nobody changed the code in weeks.
The integration still "works."
It simply works like a distributed system that nobody designed as one.
We underestimate what a line on a diagram actually adds
An integration is not a cable.
It is a commitment between two systems that evolve independently, fail independently, and interpret the same event differently.
Every time you add one, you add:
- Another clock. The source system thinks the event happened at T₀. The target learns about it at T₀ + latency + retry delay + queue depth.
- Another failure domain. The network, the credentials, the rate limit, the maintenance window, the deploy that changed a field name without telling you.
- Another opinion about truth. The commerce platform says the order exists. The warehouse says it never arrived. The ERP will say something else entirely until the batch job runs.
- Another retry policy. Yours, theirs, or both — each behaving exactly as configured, each capable of duplicating work the business considers singular.
This is not an argument against integrations. Enterprise software is made of them. Agencies live inside them. The argument is against the vocabulary we use when we plan them.
We say connect when we mean couple.
We say sync when we mean eventually agree.
We say real-time when we mean fast enough that nobody has complained yet.
Distributed systems theory did not become irrelevant when we moved to SaaS and APIs. It became invisible — buried under friendlier words that hide the same failure modes.
The first API is easy. The tenth is architecture.
Production Notes #01 was about state: many systems, many versions of reality.
This piece is about how those versions multiply.
One integration between two systems creates two observers of a business process.
Add a CRM update, an analytics event, a fraud check, a tax calculation service, an AI enrichment step, and a nightly reconciliation export.
You do not have one application with six integrations.
You have seven systems that must agree — or disagree gracefully — about what happened.
The complexity does not arrive when you adopt microservices or message queues.
It arrives when the second system starts making decisions based on information that might already be stale.
I have seen teams spend months debating whether they need an event bus while running twelve point-to-point HTTP integrations in production. Each one with its own timeout, its own error handler, its own manual replay script in a shared spreadsheet.
They already had a distributed system.
They just had not drawn the boundary around it.
What changes when you name it correctly
Once you accept that every integration is a distributed system, several design conversations become clearer.
Timeouts are not polish. They define how long two systems are allowed to disagree before someone intervenes.
Idempotency is not an edge case. It is how you survive the fact that messages arrive more than once — because they will.
Retries are not free. They amplify load, duplicate side effects, and reorder events. (Production Notes #05 will go further into this.)
Batch jobs and webhooks are not alternatives. They are two different consistency models pretending to describe the same business event.
Monitoring an integration is not the same as monitoring an API. You need to know whether the business operation completed across boundaries — not whether a single HTTP call returned 200.
These are not exotic concerns. They are the baseline cost of connecting software that belongs to different owners, release cycles, and definitions of correctness.
The human layer makes it worse
Technical integrations fail in predictable ways.
Organizational integrations fail in expensive ones.
The warehouse team changes a status code meaning without a changelog.
Finance disables an API key over a billing dispute.
The agency's staging environment points at production because someone copied the wrong environment variable three months ago.
A client insists the sync must be synchronous because "we need it real-time" — and nobody writes down what should happen when the downstream system is down for maintenance.
Enterprise integrations fail for human reasons as often as technical ones. The system design must assume both.
That is why mature teams document not just endpoints but ownership: who is authoritative for which field, who gets paged when two systems disagree, and how long disagreement is allowed to persist before a human reconciles it.
Without that, every integration becomes an informal negotiation conducted through logs and support tickets.
AI does not simplify the picture
If anything, AI workflows are integration stacks wearing a new interface.
A retrieval step pulls from a knowledge base.
A tool call hits the CRM.
Another call checks inventory.
A policy service validates the output.
Each step is an integration with latency, failure modes, and stale data — wrapped around a model that confidently synthesizes whatever it was given.
When the answer is wrong, teams often retrain, reprompt, or swap models.
Sometimes the right fix is humbler: stop asking the model to reason across five sources that have not agreed with each other since Tuesday.
Good AI architecture is good integration architecture with an additional requirement: the consumer of the data cannot see the seams. That makes invisible disagreement more dangerous, not less.
That's why production AI delivery has to include the integration plumbing — not just the model headline.
Questions worth asking before the next integration
I have started using a short checklist before approving a new connection — whether it is a client request, a partner API, or an internal service.
- What happens if this message arrives twice?
- Which system owns this field when they conflict?
- How long can the target operate on stale data before the business is harmed?
- What is the recovery path when the target is unavailable for an hour?
- Can we observe cross-system success, not just transport success?
- Who is paged when finance and operations see different numbers?
If the answers are vague, you are not scoping an integration.
You are scoping a distributed system without a reliability budget.
A different standard for "done"
An integration is not done when the API call works in staging.
It is done when you can answer what happens under delay, duplication, partial failure, and manual intervention — without improvising.
That standard feels heavy for a single webhook.
It feels necessary by the tenth.
The teams I trust in production are not the ones with the most integration experience on paper. They are the ones who stopped pretending that experience was categorically different from running distributed software.
Because it never was.
Every integration is a distributed system.
The only real choice is whether you design for that fact — or discover it in a support queue at 4 p.m. on a Friday.
Originally published on LinkedIn as part of the Binary and Beyond newsletter. Building enterprise integrations that survive production? Start a conversation.
