Interview Prep2026-05-1225 min read

Architect & To B Interview Guide: High Availability, Distributed Systems & Multi-Tenancy

If you are interviewing for a staff engineer, architect, or technical lead role — especially one with a To B or enterprise focus — the questions shift. They stop being about how to invert a binary tree and start being about how to keep a system alive at 3 AM when a downstream dependency melts down, or how to design tenant isolation that won't leak customer data when someone fat-fingers a SQL query.

I have sat on both sides of this table. Here are the 23 questions that actually get asked, with answers grounded in production experience rather than textbook recitation.

System Design & High Availability

How do you design a high-availability system? What dimensions do you evaluate?

No single points of failure anywhere. Services run in multiple instances behind a load balancer. Databases use primary-standby or multi-primary replication. Storage has redundant copies. Avoid single-machine, single-AZ, single-region architectures.

Fault isolation is equally important. Split services with independent thread pools or coroutine pools. Use circuit breakers and fallbacks so one failing dependency does not cascade.

Observability is non-negotiable. Metrics, distributed tracing, structured logging, and alerting must let you detect, locate, and postmortem any incident. Then disaster recovery: multi-region active-active or active-passive, regular backup drills, and clearly defined RTO and RPO targets. Finally, shipping: canary deploys, blue-green, feature flags, and a rollback plan for every release.

What do SLA, RTO, and RPO mean in practice?

Availability is measured in nines. 99.9% means roughly 8.76 hours of downtime per year. 99.99% means about 52 minutes. You measure it as successful requests over total requests, or uptime over total time.

SLA — Service Level Agreement — is the external promise you make: latency, error rate, availability. It is often tied to billing or compensation clauses. RTO — Recovery Time Objective — is "how long can the service be down before the business screams." RPO — Recovery Point Objective — is "how much data can we afford to lose," measured in time. RPO dictates whether you go synchronous replication (near-zero RPO, higher latency and cost) or asynchronous (some data loss window, cheaper).

How do you do capacity planning? What happens before a major sale or new product launch?

You start by estimating QPS, data volume, and downstream call fan-out. Then benchmark a single instance to find its ceiling. From there, calculate how many instances, how many DB connections, how much Redis memory you need.

Stress testing follows. Full-link tests with shadow tables or isolated traffic. Single-API benchmarks. Push to 1.2x-1.5x of expected peak and find the bottleneck — usually the database, sometimes a downstream RPC, a thread pool, or GC.

Then provisioning: set elastic scaling policies or reserve capacity. Pre-configure rate limits and degradation switches. Write runbooks for what gets degraded first. After the event, compare actual QPS, latency, and error rates against predictions and update your capacity model.

High Concurrency & Performance

Where do bottlenecks hide in high-concurrency systems? How do you find and fix them?

The usual suspects: the database (connection pool exhaustion, slow queries, row-level locks), cache stampedes, downstream RPC timeouts, saturated thread pools, GC pauses, or single-machine CPU/IO saturation.

Tracing is your first tool — follow a single request through every hop and find where time disappears. Then dig into DB slow logs and lock waits. Profile CPU, memory, and goroutines with pprof or async profiler. Check Redis for big keys, hot keys, and slow commands.

Fixes are layered: add caching, separate reads from writes, shard databases, apply backpressure with rate limiting and circuit breakers, offload work to message queues, tune connection pools and timeouts, and in code reduce lock contention, reduce serialization overhead, and batch where possible.

How do you design a read service that handles millions of QPS?

Multiple cache tiers. First, in-process cache (like a bounded LRU). Then a distributed cache layer (Redis cluster). Hot data lives as close to the CPU as possible — the fewer network hops, the better.

Shard everything. Data sharded by key means stateless service instances scale horizontally. Route read requests by user ID or key hash to a fixed node so local cache hit rates stay high.

Serve from the edge where possible. CDN for static assets and even short-lived API responses. Multi-region deployment with geo-routed traffic.

Degrade gracefully. If cache expires and the downstream is slow, return stale data rather than nothing. Rate-limit to protect the origin. Use read replicas or specialized read-optimized stores for the hot path.

When the database is the bottleneck, what optimization and sharding strategies work?

Start with the easy wins: indexing, slow query optimization, avoiding large transactions and long-held locks, connection pooling, and read-write splitting with read replicas.

When that stops working, shard. Vertical sharding first — split by business domain (users in one cluster, orders in another). Then horizontal sharding: pick a shard key (user_id modulo N, or a hash range) and route queries accordingly. Use a sharding middleware or a smart client library.

Separate hot and cold data. Archive old records to cold storage or object stores. Keep only the working set in the online database. Introduce fit-for-purpose stores: Elasticsearch for search, Redis for cache, columnar or OLAP engines for analytics.

Microservices & Distributed Systems

What are the principles for splitting microservices? What happens if you go too fine-grained?

Split along business boundaries and bounded contexts. High cohesion within a service, loose coupling between them. Each service should be independently deployable and independently scalable. Start with business closure — what does the business need to function — before technical seams. Avoid circular dependencies.

When you over-split, call chains grow long, latency stacks up, and the blast radius of any failure widens. Operational overhead explodes: more pipelines, more dashboards, more on-call rotations. Distributed transactions become harder. Team coordination costs rise. The right grain size is what a single team can own and reason about, balanced against how often the boundary actually changes.

How do service discovery and load balancing work in a microservice world?

When an instance starts, it registers with a registry — Nacos, Consul, Etcd — providing its address and metadata. Callers pull or subscribe to the registry for the service's instance list and cache it locally, refreshing periodically. When instances go down, they deregister or their heartbeat times out and they are evicted.

For load balancing, client-side LB is common: the caller picks an instance from the cached list using round-robin, random, weighted, or least-connections. Server-side LB routes through a gateway or LB appliance. Canary deployments route by version tag or header to a subset of instances.

What are the common distributed transaction patterns and when do you use each?

Two-phase commit (2PC): A coordinator orchestrates prepare and commit across participants. Strong consistency but blocking, and the coordinator is a single point of failure. Use it for short transactions with few participants where strong consistency is mandatory.

TCC (Try-Confirm-Cancel): Try reserves resources, Confirm finalizes, Cancel releases. You write the compensation logic yourself. Flexible but expensive to build. Good for business workflows that demand precise control.

Saga: Decompose a long transaction into a sequence of local transactions, each with a compensating action. If any step fails, run the compensations for completed steps in reverse order. Eventually consistent. Works well for long-running flows like order-plus-inventory-plus-points.

Transactional outbox / transactional messaging: Write the business change and the message in the same database transaction. A separate process polls and publishes to MQ. Consumers are idempotent. Simple, eventually consistent, and suitable for most async decoupling needs.

My rule of thumb: if eventual consistency is acceptable, use MQ + idempotent consumers. If strong consistency is required and the scope is small, 2PC or TCC. Long workflows, use Saga.

How do you design a distributed ID? What are the trade-offs of Snowflake?

Requirements: globally unique, roughly ordered (helps database indexes), highly available, compact.

Options: UUID (random, long, ruins index locality). Database auto-increment (single point, poor scalability). Segment-based allocation like Meituan Leaf (batch allocate ID ranges). Snowflake: timestamp + worker ID + sequence number, generated locally, time-ordered.

Snowflake's strength is that it requires no coordination and is very fast. Its weakness is clock dependency — clock rollback can cause duplicates and must be handled explicitly. Worker ID assignment across machines and regions needs careful planning. Variants like Leaf-snowflake use etcd or Redis for worker ID allocation.

Message Queues & Async Processing

How do you choose an MQ? When Kafka vs RocketMQ vs RabbitMQ?

The decision matrix: throughput, latency, durability, ordering guarantees, transaction/delay message support, operational complexity, ecosystem, and team familiarity.

Kafka: extreme throughput, log-oriented, great for streaming, event sourcing, and analytics workloads. Excellent persistence and replay capability.

RocketMQ: strong support for ordered messages, delayed messages, and transactional messages. Rich Chinese documentation. Well suited for order processing, trading systems, and peak-shaving.

RabbitMQ: flexible routing with exchanges and bindings. Good for complex routing topologies and when latency and feature richness matter more than raw throughput.

Choose based on which dimension matters most: throughput (Kafka), messaging features (RocketMQ), or routing flexibility (RabbitMQ).

How do you guarantee no message loss, no duplicates, and ordered consumption?

No loss: producer waits for broker acknowledgment after persistence (acks=all). Broker replicates to multiple nodes. Consumer processes first, then commits offset. Failures trigger retry or dead-letter routing.

No duplicates: make consumers idempotent. Deduplicate by business key (order ID, etc.) or check-before-write. Attach a unique message ID and enforce a uniqueness constraint in the database.

Ordering: within a single partition, order is preserved. Publish with the same key to land in the same partition. Single consumer within a partition processes sequentially. If global order is not required — and it rarely is — partition by business key and you are done.

Caching & Storage

What are cache penetration, hot-key invalidation, and cache avalanche? How do you handle each?

Penetration: a request for a key that does not exist in the database either, so it passes straight through cache and DB every time. Fix with a Bloom filter to reject impossible keys, or cache a nil value with a short TTL.

Hot-key invalidation: a heavily accessed key expires and suddenly all requests pound the database. Fix by never expiring hot keys (use logical expiry with async refresh) or use a mutex so only one caller goes to origin while others wait.

Avalanche: many keys expire simultaneously, or the cache cluster goes down entirely. Fix by adding random jitter to expiration times, using multi-level caching, circuit breaking, and running the cache in a highly available cluster so it does not fail as a single point.

How do you design a multi-level cache (local + Redis)? How to handle consistency?

Requests hit the local cache first (in-process, bounded LRU). On miss, go to Redis. On Redis miss, query the database and backfill both Redis and the local cache.

Consistency anchors on the database. On writes: update DB first, then invalidate Redis, then broadcast to all instances to invalidate their local caches (or just wait for short TTL expiry). For read paths, if a brief dirty read is tolerable, rely on TTL alone. For tighter consistency, use a binlog listener like Canal to trigger cache invalidation.

Watch for local cache memory pressure, eviction policies, and the risk of thundering herd when many nodes discover a cold key simultaneously. Mutual exclusion or single-flight patterns help.

To B / Enterprise & Multi-Tenancy

What are the key architectural differences between To B and To C systems?

Tenancy and isolation: To B systems are inherently multi-tenant — enterprises, organizations — each needing data, configuration, and resource isolation. Roles and permissions are far more complex: RBAC, data-level ACLs, approval workflows.

Customization: enterprises demand customizations, private deployments, and integration with their internal systems. This requires configurability, plugin architectures, open APIs, and metadata-driven design to avoid one-off code branches.

Compliance and security: audit logs, data masking, regulatory requirements. Permissions must be fine-grained and every operation must be traceable.

Stability and SLAs: enterprise customers care deeply about availability and data security. Contracts often include SLAs with penalties. You need multi-AZ or multi-region disaster recovery, backup and restore, and defined change windows.

Performance profile: To C is spiky high-concurrency. To B may have lower concurrency but each request is complex, data volumes are large, and there are heavy reporting and export workloads. Optimize for single-request depth and batch throughput.

What are the common approaches to multi-tenant data isolation?

Database-per-tenant: each tenant gets its own database. Maximum isolation, independent backup and scaling, highest cost. For large customers or strict compliance requirements.

Shared database, separate schema: within a single instance, each tenant has its own schema. Good isolation, backup and scaling per schema. For mid-to-large tenants.

Shared tables with tenant_id: all tenants share tables, differentiated by a tenant_id column. Simplest to implement, lowest cost, but requires strict discipline — every query must include the tenant_id filter or you leak data. For SaaS with many small tenants.

Hybrid: flagship customers get dedicated databases; the long tail shares tables. Choose based on customer tier and isolation requirements.

How do you design enterprise-grade permissions (RBAC + data scope)?

RBAC at its core: users are assigned roles, roles carry permissions (menu items, buttons, API endpoints). Roles can inherit or compose. The API layer checks "does the current user's role include this permission before executing."

Data scope controls what data a user can see, not just what they can do. For example: only their own records, their department's, their department and sub-departments, or everything. Implement by injecting data scope conditions into queries automatically (e.g., WHERE dept_id IN (...)). Data scope is typically configured per role or via a rules engine.

Extensions: approval workflows, step-up authentication for sensitive operations, permission change auditing, SSO and LDAP integration.

How do you design an open API platform for enterprise customers?

Authentication: AppKey + AppSecret for server-to-server, or OAuth2 for delegated access. The gateway validates tokens or signatures and identifies the caller.

Authorization and rate limiting: throttle by application or tenant — QPS, concurrent connections, daily quotas. Authorize per API or per subscription tier. Reject unauthorized calls at the gateway.

Security: HTTPS everywhere, request signing to prevent tampering, sensitive field encryption, IP allowlists, anti-replay via timestamp and nonce.

Observability: log every call with latency and error codes. Give customers a dashboard showing their usage, quota, and error breakdowns.

Versioning and compatibility: API versioning via URL path or header. Deprecate old versions with advance notice and canary cutovers. Maintain backward compatibility wherever possible.

Technology Selection & Execution

What factors do you weigh when making a technology choice?

Business fit first: does it solve the functional, performance, and scalability needs? Are there proven reference cases at similar scale?

Team and operations: does the team know it or can ramp up quickly? Is the community healthy, documentation solid, hiring pool deep? How complex is the operational surface — monitoring, troubleshooting, upgrading?

Cost: licensing, cloud service fees, and engineering maintenance cost. Build vs. buy vs. self-host vs. managed service.

Ecosystem and lock-in: how well does it integrate with the existing stack? Avoid excessive vendor lock-in unless the benefits clearly outweigh the cost.

Evolution: can you upgrade smoothly? Is there a migration path? Is the community active and the project likely to be maintained for years?

How do you drive technical decisions through a team? What if there is resistance?

Build alignment first. Explain the context, the goals, the expected payoff, and the risks. Back it with data or a proof of concept. Connect the decision to business value.

Ship incrementally. Pilot first, then expand. Feature flags, canary deploys, rollback plans — all reduce perceived risk and make adoption easier.

Bring key engineers into the design review. Incorporate their feedback genuinely. People resist less when they helped shape the outcome.

When there is pushback, listen for the real concern — technical doubt, historical scar tissue, resource anxiety. Address it concretely: compatibility plans, migration tooling, training. Escalate priority alignment with leadership only when necessary. Let the pilot results speak louder than any argument.

Soft Skills & Architectural Thinking

How do you run a technical design review? What do you look at first?

Scope and boundaries: are the requirements clear? Are non-functional concerns — performance, availability, security — explicitly addressed?

Architecture soundness: module boundaries, dependency direction, extension points. Are there single points of failure? Can a failure cascade?

Data and consistency: storage design, sharding strategy, consistency guarantees. Are there risks of double-charge, oversell, or data corruption?

Operability: deployment, rollback, monitoring, alerting, capacity planning, rate limiting. What does the on-call experience look like?

Risk and cost: technical risk, migration risk, resource and timeline estimates. Is there a fallback or rollback path if it does not work?

What does an architect do during a production incident?

Stop the bleeding first. Work with the on-call engineer to execute immediate mitigation — rate limit, degrade, rollback, drain traffic, remove the sick node. Restore service before finding root cause.

Then triage. Look at dashboards, logs, and traces. Narrow the scope: which service, which instance, which dependency. Preserve evidence — thread dumps, heap dumps, core dumps — before restarting anything.

Afterward, postmortem. Root cause analysis using 5 Whys and a timeline. Turn findings into concrete improvements: code changes, config changes, process changes, monitoring gaps. Assign owners and track follow-through. The goal is not to assign blame but to prevent recurrence.

How do you manage technical debt? When do you refactor vs. live with it?

Identify and quantify. Code complexity, duplication, testability. Incident frequency and change difficulty. The modules the team is afraid to touch.

Prioritize ruthlessly. Debt that threatens stability, security, or team velocity gets paid first. Purely cosmetic issues can wait.

Refactor when there is a business need touching that area, or when incidents keep happening there, or when new team members need to ramp up. Always refactor with tests alongside the actual feature work — never refactor for its own sake.

Live with it when business pressure is high, the team is stretched, and the module is stable and rarely changed. Document the known issues, add guardrails (monitoring, integration tests), and schedule the cleanup for a quieter period. Never do a big-bang rewrite on a critical path unless you have no choice.

2026-05-11