The Five Questions Every IIoT Architect Must Answer

Every IIoT project I’ve worked on — from small pilot sites to global rollouts spanning 30+ manufacturing facilities — eventually comes down to the same five questions.

I used to think architecture was about picking the right protocols or drawing clean zone diagrams. And sure, that matters. But the real work happens when you sit down with plant managers, IT teams, and business stakeholders who all want different things, and you have to figure out what you’re actually building and why.

These five questions came up in every vendor evaluation, every strategy workshop. They’re the ones that separate projects that scale from projects that stall at the pilot phase.

1. What data do we actually need, and why?

This sounds obvious, but it’s where most projects go sideways.

I’ve seen teams connect hundreds of thousands of tags to the cloud because they could, not because anyone asked for it. Then six months later, someone from finance asks why the cloud bill is so high, and nobody has a good answer.

On one large IIoT program, we started by asking business users what decisions they wanted to make. Predictive maintenance? Then we need vibration data and temperature trends, not every PLC bit that flips during a batch. Real-time dashboards for production? Then we need line state, good count, reject count — not the full historian archive.

Turns out, when you actually talk to people, you need way less data than you think. And the data you do need has to arrive fast, with context, and in a format someone can use.

My unpopular opinion: Most IIoT platforms are over-engineered because nobody asked this question first.

2. Where does the data need to go, and how fast does it need to get there?

I worked on a project where the edge team streamed everything to the cloud in real time via MQTT. Beautiful architecture. Then we found out the analytics team wanted daily batch data in Snowflake, and the MES team needed live OPC UA subscriptions on-premises.

So now we had three data flows — real-time streaming, micro-batch uploads, and local historian queries — all pulling from the same thousands of tags. Nobody planned for that.

The question isn’t just “cloud or edge?” It’s:

Does this data need to be available in 1 second or 1 hour?
Who’s consuming it — a dashboard, an ML model, a compliance audit?
Do we need store-and-forward buffering if the network goes down?

3. How do we make the data meaningful?

Raw PLC tags are useless.

I’ve looked at tags named things like “DB45.DBD12” or “PV_3847_A.” Nobody knows what that means six months later, let alone across 30+ sites in different countries.

We spent a lot of time on this — building a Unified Namespace (UNS) with a consistent plant hierarchy and naming convention aligned to ISA-95. Every tag gets contextualized with site, area, line, equipment, and process step.

For example, instead of “TIC_401,” you get something like “Site A / Product / Line 3 / Blender / Temperature / Setpoint.” Now an engineer in Germany can look at data from a site in the U.S. and understand it immediately.

Batch context is even harder. You need to link time-series data to batch IDs, product codes, and process steps — either at the edge or in the cloud. Both approaches work, but you have to pick one and stick with it, or you’ll end up with two different versions of the truth.

4. How do we keep it secure and compliant?

This is the one that makes or breaks projects in regulated industries.

In a regulated environment, we had to meet GxP, FDA 21 CFR Part 11, and strict data integrity rules. That meant:

Network segmentation
Role-based access control with multi-factor authentication
End-to-end encryption
Full audit trails with five-year retention

Compliance isn’t optional. If you skip it during the pilot, you’ll have to rebuild everything later.

5. How do we actually deploy and maintain this at scale?

Pilots are easy. Scaling to 30+ sites is where things fall apart.

I’ve seen plenty of one-off architectures that worked great at a single site but couldn’t be replicated. Custom scripts, hard-coded IP addresses, configurations stored in someone’s laptop. You can’t scale that.

We built deployment playbooks — step-by-step guides with RACI matrices, workload estimates, and clear ownership. Every site followed the same process: validate network, install edge gateways, configure the UNS, connect to data sources, test store-and-forward, validate data in the cloud.

And then there’s governance. Who maintains the connectors when a vendor releases a new firmware? Who updates the naming conventions when a site adds a new production line? Who monitors the platform health and responds when something breaks?

If you don’t answer these questions before you scale, you’ll end up with a mess — 30 different configurations, no central visibility, and every site doing their own thing.

So, what’s the real answer?

There isn’t one architecture that works for everyone. Edge-centric, platform-centric, hybrid — it depends on your use cases, your existing infrastructure, and your organization’s maturity.

But if you can answer these five questions clearly, you’re already ahead of most projects I’ve seen.

The technology is the easy part. The hard part is aligning stakeholders, understanding what success looks like, and building something that actually works in production — not just in a demo.

The Industrial IoT Blog