gRPC Deep Dive: Protocol Buffers, HTTP/2 Streaming, and How RPC Works at Scale

· grpcrpcprotocol-buffershttp2microservices

What is RPC?

Imagine you are writing a program that needs to call a function on another computer. You want it to feel like a local function call — you pass arguments, you get a return value, and you move on. That is Remote Procedure Call (RPC).

Before gRPC, there were older RPC systems: CORBA, Java RMI, XML-RPC, SOAP. Each had its own way of serializing data, describing services, and handling network communication. They were complex, slow, and tightly coupled to specific languages.

gRPC, created by Google in 2015, solves this with a clean stack:

LayerTechnologyRole
Interface DefinitionProtocol Buffers (.proto files)Define services and message shapes
SerializationProtocol Buffers (binary wire format)Compact, fast encoding of structured data
TransportHTTP/2Multiplexed streams, headers, flow control
Code Generationprotoc + language pluginGenerate stubs and skeletons in any language

gRPC generates client and server code from a .proto file. The client calls a local method, which serializes the arguments into protobuf bytes, sends them over HTTP/2 to the server, which deserializes them, calls the handler, and sends the response back.

How gRPC Works: Stubs and Skeletons

When you define a service in a .proto file and run the protoc compiler, it generates two pieces of code:

  • Stub (client side): An object with methods that match the service definition. Your application calls stub.GetUser(request), and the stub handles serialization, framing, and network I/O.
  • Skeleton (server side): A base class or interface that your application implements. You write the business logic in a handler, and the skeleton handles deserialization and response writing.
service UserService {
  rpc GetUser (GetUserRequest) returns (User);
}

After code generation, the stub translates GetUser into an HTTP/2 request with path /UserService/GetUser. The server skeleton routes this to your handler. The network is abstracted away — you work with native objects on both sides.

// Generated client stub (JavaScript)
const client = new UserServiceClient('https://api.example.com', grpc.credentials.createInsecure())
const request = new GetUserRequest()
request.setId(42)
client.getUser(request, (error, response) => {
  console.log(response.getName()) // "Alice"
})

Protocol Buffers: The Interface Definition Language

Protocol Buffers (protobuf) is both an IDL (Interface Definition Language) and a serialization mechanism. You write a schema in a .proto file, and protobuf compiles it into code for any supported language.

syntax = "proto3";

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string roles = 5;
}

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (stream User);
  rpc CreateUser (stream CreateUserRequest) returns (User);
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

Key protobuf features:

  • Strong typing: Every field has a type (int32, string, bool, enum, nested message)
  • Field numbers: Each field gets a unique number (1-2^29-1). This is the wire identifier — never change it
  • repeated: Lists / arrays of a type (replaces arrays in JSON)
  • stream: Marks a streaming RPC (client or server side)
  • oneof: Exactly one of several fields can be set
  • map<K,V>: Key-value pairs (like dictionaries)

Compared to JSON Schema or OpenAPI, protobuf is more compact and code-generation-first.

The Protobuf Wire Format

When protobuf serializes a message to bytes, it uses a compact binary format. Each field is encoded as a tag-value pair:

tag = (field_number << 3) | wire_type

There are only six wire types:

Wire TypeMeaningUsed For
0Varintint32, int64, uint32, bool, enum
164-bitfixed64, sfixed64, double
2Length-delimitedstring, bytes, embedded messages, repeated
3Start groupDeprecated (proto2 only)
4End groupDeprecated (proto2 only)
532-bitfixed32, sfixed32, float

Varint encoding packs integers into fewer bytes. The most significant bit (MSB) of each byte indicates whether more bytes follow. For values under 128, it is just one byte. For larger values, it takes as many bytes as needed.

For example, int32 id = 1 with value 150:

Binary: 10010110 00000001
  -> Strip MSBs: 0010110 0000001
  -> Reverse groups: 0000001 0010110
  -> Value: 128 + 22 = 150

Protobuf messages are self-describing enough that tools can decode them without the schema (using field numbers and wire types), but field names are lost — that is why you need the .proto file to get meaningful output.

Protocol Buffer Wire Format

Edit the fields below and watch the binary wire format update in real time. Each field is encoded as a tag (field_number << 3 | wire_type) followed by its value.

person.proto
message Person { int32 id = 1; string name = 2; string email = 3;}
Wire Format (28 bytes)
082a|1205416c696365|1a11616c696365406578616d706c652e636f6d|
id
name
email
JSON (52 bytes)
{ "id": 42, "name": "Alice", "email": "alice@example.com" }
1.9x larger than protobuf
Field Encoding Details
Field 1 (int32 id)2 bytes
tag: 0x08 (wire_type=varint (0))value: 0x2a
Raw: 42
Field 2 (string name)7 bytes
tag: 0x12 (wire_type=length-delimited (2))value: len=5 [41 6c 69 63 65]
Raw: "Alice"
Field 3 (string email)19 bytes
tag: 0x1a (wire_type=length-delimited (2))value: len=17 [61 6c 69 63 65 40 65 78 61 6d 70 6c 65 2e 63 6f 6d]
Raw: "alice@example.com"
id field — tag (0x08) + varint value
name field — tag (0x12) + length + UTF-8 bytes
email field — tag (0x1a) + length + UTF-8 bytes

Schema Evolution: Adding Fields Without Breaking Everything

The superpower of protobuf is backward and forward compatibility. You can add fields, remove fields, and change types — as long as you follow the rules:

  • Never reuse a field number: When you delete a field, do not reuse its number. Mark it as reserved.
  • New fields are optional: Old code skips fields it does not recognize (unknown fields are preserved).
  • Wire-type compatible changes: You can change int32 to int64 or uint32 (same wire type), but not int32 to string (different wire type).
message User {
  reserved 6, 10 to 15;  // These field numbers can never be used

  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string roles = 5;

  // New fields added later:
  string phone = 16;       // Safe: new number
  Address address = 17;    // Safe: new number
}

A client built with the old schema (without phone and address) can still deserialize a response from a new server — it simply ignores the unknown fields. A new client deserializing an old response gets "" for phone and a default Address for address.

This is dramatically better than JSON, where adding a field is technically easy but every client must handle the missing field explicitly.

HTTP/2: The Transport Layer

gRPC uses HTTP/2 as its transport. HTTP/2 provides features that HTTP/1.1 cannot:

  • Multiplexed streams: Multiple RPCs share a single TCP connection. No head-of-line blocking.
  • Binary framing: Frames (HEADERS, DATA, SETTINGS, etc.) are binary, not text. Efficient parsing.
  • Flow control: Per-stream flow control prevents a slow consumer from being overwhelmed.
  • Header compression (HPACK): Repeated headers (like content-type: application/grpc) are compressed to a few bytes.
  • Server push: Server can push multiple responses (for streaming) without a new request per response.

The gRPC protocol maps onto HTTP/2 like this:

  1. Each RPC creates a new HTTP/2 stream (stream ID incremented by 1)
  2. Client sends HEADERS frame with :method = POST, :path = /package.Service/Method, content-type = application/grpc
  3. Client sends DATA frame(s) with the serialized protobuf request
  4. Server sends HEADERS frame with :status = 200, grpc-status = 0
  5. Server sends DATA frame(s) with the serialized protobuf response
  6. Stream closes with END_STREAM flag

For streaming RPCs, multiple DATA frames flow in one or both directions on the same stream. The stream stays open until both sides signal completion.

Unary RPC: The Classic Request-Response Pattern

Unary RPC is the simplest pattern: the client sends exactly one request message, and the server sends exactly one response message. It maps directly to how HTTP/1.1 works, but with HTTP/2 multiplexing and protobuf efficiency.

The key gRPC-unique behavior happens in the framing:

  • The request is prefixed with a 5-byte header: 1 byte for compression flag (0 = none), 4 bytes for message length (big-endian)
  • The response follows the same framing
  • A grpc-status trailer header is sent to signal success (0 = OK) or error (non-zero)
// Unary RPC call
const request = { userId: 42 }
client.getUser(request, { deadline: Date.now() + 5000 }, (err, response) => {
  if (err) {
    console.error('gRPC error:', err.code, err.details)
    return
  }
  console.log('User:', response.name)
})

Under the hood, this triggers the frame exchange shown in the demo below.

Unary RPC: HTTP/2 Frame Exchange
gRPC Client
UserService
HTTP/2
gRPC Server
UserService
Press "Send Request" to watch the HTTP/2 frame exchange
HEADERS
DATA
Processing
HEADERS
DATA

Server-Streaming RPC: Push Data from Server to Client

Server-streaming RPC is one of the patterns that gRPC makes easy but REST struggles with. The client sends a single request, and the server sends a stream of responses over time.

This is perfect for:

  • Listing a large dataset (paginated without pagination logic)
  • Real-time updates (the server sends new events as they happen)
  • Progress notifications during long-running operations
// Server-streaming RPC call
const call = client.listUsers({ role: 'admin' })
call.on('data', (user) => {
  console.log('Received user:', user.name)
})
call.on('end', () => {
  console.log('All users received')
})
call.on('error', (err) => {
  console.error('Stream error:', err)
})

The server writes multiple response messages on the same HTTP/2 stream. Each message has the standard 5-byte frame header. The stream stays open until the server sends a grpc-status trailer.

Client-Streaming RPC: Batch Uploads from Client to Server

Client-streaming RPC reverses the pattern: the client sends multiple messages, and the server sends a single response after receiving all of them.

This is useful for:

  • File uploads (sending chunks)
  • Batch processing (sending records for bulk import)
  • Aggregation (sending data points for a summary)
// Client-streaming RPC call
const call = client.createUser((error, response) => {
  if (error) {
    console.error('Upload failed:', error)
    return
  }
  console.log('Batch created:', response.count, 'users')
})

call.write({ name: 'Alice', email: 'alice@x.com' })
call.write({ name: 'Bob', email: 'bob@x.com' })
call.write({ name: 'Carol', email: 'carol@x.com' })
call.end()

The client sends DATA frames for each message. The server accumulates them or processes them incrementally. When the client calls end(), the server sends its single response.

Bidirectional Streaming RPC: Real-Time Two-Way Communication

Bidirectional streaming allows both client and server to send messages independently on the same stream. Unlike server-streaming (where the server responds to a single request) or client-streaming (where the client sends before the server), bidirectional streaming has no ordering constraints.

Messages flow asynchronously. The client can send 5 messages, then the server sends 3. Or the client sends 1, the server sends 1, the client sends 2. Any pattern works.

This is the foundation for:

  • Chat applications
  • Real-time collaboration (Google Docs-style)
  • Game state synchronization
  • Publish/subscribe with bidirectional negotiation
// Bidirectional streaming RPC call
const call = client.chat()

call.on('data', (message) => {
  console.log('Server says:', message.text)
})

call.write({ text: 'hello', userId: 1 })
call.write({ text: 'how are you?', userId: 1 })

// Meanwhile, the server can send messages whenever it wants

The key insight: bidirectional streaming in gRPC does NOT require the client and server to take turns. The stream is full-duplex on top of HTTP/2’s multiplexed framing.

Four RPC Types
Single request, single response. Classic request-reply.
Client
stub
HTTP/2 stream
Server
handler
Stream Details
Press "Run" to animate the unary flow

Deadlines and Timeouts

Every gRPC call should have a deadline (client-side timeout). If the server does not respond within the deadline, the client cancels the request and receives a DEADLINE_EXCEEDED error.

// Set a deadline of 5 seconds from now
const deadline = new Date()
deadline.setSeconds(deadline.getSeconds() + 5)

client.getUser(request, { deadline }, (err, response) => {
  if (err && err.code === grpc.status.DEADLINE_EXCEEDED) {
    console.error('Request timed out')
  }
})

Without a deadline, a gRPC client could wait forever if the server hangs. Deadlines propagate through the call chain — if service A calls B with a 5-second deadline, and B calls C, the remaining time is automatically propagated to C. This is called deadline propagation and prevents cascading failures.

// Server-side: check remaining time
server.on('getUser', (call, callback) => {
  const remaining = call.getDeadline() - Date.now()
  if (remaining < 500) {
    callback({ code: grpc.status.DEADLINE_EXCEEDED, details: 'Not enough time to process' })
    return
  }
  // Process normally
})

Interceptors: Middleware for Your gRPC Calls

Interceptors are the gRPC equivalent of middleware in web frameworks. They wrap every RPC call with cross-cutting behavior, without modifying the business logic.

Client interceptors run on the client side, wrapping outgoing calls:

  • Logging (log method name, duration, status)
  • Authentication (attach tokens to metadata)
  • Retry logic (retry on transient failures)
  • Tracing (propagate distributed tracing headers)

Server interceptors run on the server side, wrapping incoming calls:

  • Authentication (validate tokens before the handler runs)
  • Rate limiting (check request quotas)
  • Metrics (count requests, measure latency)
  • Request validation (validate fields before the handler)
// Server-side interceptor (pseudocode)
function loggingInterceptor(ctx, next) {
  const start = Date.now()
  console.log(`[gRPC] -> ${ctx.method}`)

  return next(ctx).then(response => {
    const duration = Date.now() - start
    console.log(`[gRPC] <- ${ctx.method} (${duration}ms)`)
    return response
  })
}

Interceptors compose like a chain: the first interceptor wraps the next, which wraps the next, until the actual handler runs. The response flows back through the chain in reverse order. This pattern is called the middleware chain or pipeline pattern.

Interceptor Chain

Toggle interceptors on or off. When you send a request, they execute in order: client-side first (top to bottom), then the gRPC call, then server-side (top to bottom).

Client Interceptors
Logging1st
Auth Token2nd
Retry3th
Server Interceptors
Auth Verify1st
Metrics2nd
Rate Limit3th
Interceptor execution order:
Logging>
Auth Token>
Retry>
Auth Verify>
Metrics>
Rate Limit

gRPC vs REST: When to Use Which

gRPC and REST solve the same problem — client-server communication — but with radically different trade-offs.

DimensiongRPCREST
SerializationBinary (protobuf)Text (JSON/XML)
SchemaRequired (.proto file)Optional (OpenAPI is separate)
StreamingNative (4 types)Polling or SSE (workarounds)
Browser supportgRPC-Web (limited)Native (fetch, XMLHttpRequest)
Human readabilityNo (binary)Yes (JSON)
CachingNo (POST only)Yes (GET caching)
Code generationBuilt-inExternal tools (OpenAPI Generator)
Performance5-10x faster serializationSlower, larger payloads

gRPC excels in two scenarios:

  1. Internal microservices communication: Low latency, high throughput, polyglot teams, streaming requirements. gRPC is the default choice for service-to-service calls.
  2. Real-time data flows: Chat, live dashboards, event streams, collaborative editing. REST needs SSE or WebSocket for these.

REST still wins for:

  1. Public APIs: Universal browser support, human debuggable, every HTTP client can call them.
  2. Simple CRUD: If you just need create/read/update/delete, REST is simpler and well-understood.
  3. Caching-heavy systems: HTTP caching (ETags, Cache-Control) is powerful and built into every proxy and CDN.
gRPC vs REST
gRPCUnary
{"id": 42}
Payload
42 B
Latency:2 ms
RESTGET /users/:id
{"id": 42, "name": "Alice", "email": "alice@example.com"}
Payload
167 B
Latency:4 ms
Feature Comparison
Feature
gRPC
REST
Schema enforcement
Y
N
Code generation
Y
N
Native streaming
Y
N
Browser support
N
Y
Human readable
N
Y
Caching (HTTP cache)
N
Y
Strong typing
Y
N
Payload compression
Y
N
When to Choose
gRPC wins:
Microservices, real-time streaming, low-latency internal APIs, polyglot environments
REST wins:
Browser clients, public APIs, simple CRUD, caching-critical systems, debugging ease

gRPC-Web: RPC in the Browser

Browsers cannot send raw HTTP/2 frames or use the gRPC trailers mechanism. gRPC-Web bridges this gap:

  1. The browser uses a gRPC-Web-compatible client library
  2. Requests go through a proxy (like Envoy) that translates gRPC-Web to standard gRPC
  3. The proxy converts trailers into a response body chunk (trailers as a base64-encoded block)
// gRPC-Web client (browser)
import { GrpcWebClient } from 'grpc-web'

const client = new UserServiceClient('https://api.example.com')
client.getUser({ id: 42 }, (err, response) => {
  // Works the same as the standard gRPC client
})

gRPC-Web has limitations:

  • No bidirectional streaming (only unary and server-streaming)
  • No trailer-based metadata
  • Proxies add latency

For internal browser applications that need streaming, WebSocket or SSE are often better choices than gRPC-Web.

Load Balancing with gRPC

Load balancing gRPC is different from HTTP load balancing because gRPC connections are long-lived (HTTP/2 persistent connections) and do not use the typical request-per-connection model.

Client-side load balancing: The gRPC client maintains a list of server addresses and distributes calls across them. Popular strategies:

  • Round robin: Distribute calls evenly across servers
  • Pick first: Try servers in order until one succeeds
  • Weighted round robin: Distribute based on server capacity
  • Least load: Send to the server with the fewest active streams
// Client-side load balancing with round robin
const client = new UserServiceClient('dns:///api.example.com:50051',
  grpc.credentials.createInsecure(),
  { 'grpc.lb_policy_name': 'round_robin' }
)

Proxy-based load balancing: An L7 proxy (Envoy, Linkerd, NGINX) terminates the gRPC connection and distributes individual RPCs to backend servers. This is simpler and works with any language, but adds latency.

The key challenge: gRPC clients open a single HTTP/2 connection and multiplex many RPCs over it. If you do round-robin at the TCP level (L4), all RPCs go to the same server (the one connected). You must load-balance at the RPC level, not the connection level.

Reflection and Health Checking

gRPC reflection allows clients to discover services and methods at runtime without the .proto file. This is essential for tools like grpcurl, grpc_cli, and debugging consoles.

# grpcurl with reflection
grpcurl -plaintext localhost:50051 list
# Output:
# grpc.health.v1.Health
# UserService

grpcurl -plaintext localhost:50051 describe UserService.GetUser
# Output:
# UserService.GetUser is a unary RPC
#   Input: GetUserRequest
#   Output: User

grpcurl -plaintext -d '{"id": 42}' localhost:50051 UserService.GetUser
# Output:
# {
#   "id": 42,
#   "name": "Alice",
#   "email": "alice@example.com"
# }

Enable reflection on your server:

import "grpc/reflection/v1/reflection.proto";
// Register the reflection service in your server code

gRPC health checking uses a standard protocol (defined in grpc.health.v1.Health) to report service health. Kubernetes, Envoy, and other orchestrators use this to determine if a service is ready to receive traffic.

grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check
# Output:
# {
#   "status": "SERVING"
# }

A service reports one of three statuses:

  • SERVING: Ready to handle requests
  • NOT_SERVING: Alive but not accepting requests (e.g., warming up, draining)
  • SERVICE_UNKNOWN: The health check service is not registered

Health checking and reflection together form the operational foundation for running gRPC services at scale.

Summary

gRPC is a modern, high-performance RPC framework that combines Protocol Buffers for serialization with HTTP/2 for transport. Its four streaming patterns (unary, server-streaming, client-streaming, bidirectional) cover every communication pattern a distributed system needs, from simple request-reply to full-duplex real-time messaging.

The key takeaways:

  • Protocol Buffers give you a strongly-typed contract with efficient binary serialization and safe schema evolution
  • HTTP/2 provides multiplexing, flow control, and bidirectional streaming across a single TCP connection
  • Code generation eliminates boilerplate and ensures client-server compatibility
  • Interceptors provide clean separation of cross-cutting concerns from business logic
  • gRPC is not a REST replacement — it is a complementary tool optimized for internal services and streaming workloads

When you need to move data between services efficiently, with strong contracts and native streaming, gRPC is the tool that gets out of your way and lets you focus on what matters: the business logic.