gRPC Deep Dive: Protocol Buffers, HTTP/2 Streaming, and How RPC Works at Scale

What is RPC?

Imagine you are writing a program that needs to call a function on another computer. You want it to feel like a local function call — you pass arguments, you get a return value, and you move on. That is Remote Procedure Call (RPC).

Before gRPC, there were older RPC systems: CORBA, Java RMI, XML-RPC, SOAP. Each had its own way of serializing data, describing services, and handling network communication. They were complex, slow, and tightly coupled to specific languages.

gRPC, created by Google in 2015, solves this with a clean stack:

| Layer | Technology | Role | |-------|-----------|------| | Interface Definition | Protocol Buffers (.proto files) | Define services and message shapes | | Serialization | Protocol Buffers (binary wire format) | Compact, fast encoding of structured data | | Transport | HTTP/2 | Multiplexed streams, headers, flow control | | Code Generation | protoc + language plugin | Generate stubs and skeletons in any language |

gRPC generates client and server code from a .proto file. The client calls a local method, which serializes the arguments into protobuf bytes, sends them over HTTP/2 to the server, which deserializes them, calls the handler, and sends the response back.

How gRPC Works: Stubs and Skeletons

When you define a service in a .proto file and run the protoc compiler, it generates two pieces of code:

Stub (client side): An object with methods that match the service definition. Your application calls stub.GetUser(request), and the stub handles serialization, framing, and network I/O.
Skeleton (server side): A base class or interface that your application implements. You write the business logic in a handler, and the skeleton handles deserialization and response writing.

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
}

After code generation, the stub translates GetUser into an HTTP/2 request with path /UserService/GetUser. The server skeleton routes this to your handler. The network is abstracted away — you work with native objects on both sides.

// Generated client stub (JavaScript)
const client = new UserServiceClient('https://api.example.com', grpc.credentials.createInsecure())
const request = new GetUserRequest()
request.setId(42)
client.getUser(request, (error, response) => {
  console.log(response.getName()) // "Alice"
})

Protocol Buffers: The Interface Definition Language

Protocol Buffers (protobuf) is both an IDL (Interface Definition Language) and a serialization mechanism. You write a schema in a .proto file, and protobuf compiles it into code for any supported language.

syntax = "proto3";

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string roles = 5;
}

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (stream User);
  rpc CreateUser (stream CreateUserRequest) returns (User);
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

Key protobuf features:

Strong typing: Every field has a type (int32, string, bool, enum, nested message)
Field numbers: Each field gets a unique number (1-2^29-1). This is the wire identifier — never change it
repeated: Lists / arrays of a type (replaces arrays in JSON)
stream: Marks a streaming RPC (client or server side)
oneof: Exactly one of several fields can be set
map<K,V>: Key-value pairs (like dictionaries)

Compared to JSON Schema or OpenAPI, protobuf is more compact and code-generation-first.

The Protobuf Wire Format

When protobuf serializes a message to bytes, it uses a compact binary format. Each field is encoded as a tag-value pair:

tag = (field_number << 3) | wire_type

There are only six wire types:

| Wire Type | Meaning | Used For | |-----------|---------|----------| | 0 | Varint | int32, int64, uint32, bool, enum | | 1 | 64-bit | fixed64, sfixed64, double | | 2 | Length-delimited | string, bytes, embedded messages, repeated | | 3 | Start group | Deprecated (proto2 only) | | 4 | End group | Deprecated (proto2 only) | | 5 | 32-bit | fixed32, sfixed32, float |

Varint encoding packs integers into fewer bytes. The most significant bit (MSB) of each byte indicates whether more bytes follow. For values under 128, it is just one byte. For larger values, it takes as many bytes as needed.

For example, int32 id = 1 with value 150:

Binary: 10010110 00000001
  -> Strip MSBs: 0010110 0000001
  -> Reverse groups: 0000001 0010110
  -> Value: 128 + 22 = 150

Protobuf messages are self-describing enough that tools can decode them without the schema (using field numbers and wire types), but field names are lost — that is why you need the .proto file to get meaningful output.

Protocol Buffer Wire Format

Edit the fields below and watch the binary wire format update in real time. Each field is encoded as a tag (field_number << 3 | wire_type) followed by its value.

person.proto

message Person { int32 id = 1; string name = 2; string email = 3;}

Wire Format (28 bytes)

082a|1205416c696365|1a11616c696365406578616d706c652e636f6d|

name

JSON (52 bytes)

{ "id": 42,
  "name": "Alice",
  "email": "alice@example.com" }

1.9x larger than protobuf

id (int32, field 1)

name (string, field 2)

email (string, field 3)

Field Encoding Details

Field 1 (int32 id)2 bytes

tag: 0x08 (wire_type=varint (0))value: 0x2a

Raw: 42

Field 2 (string name)7 bytes

tag: 0x12 (wire_type=length-delimited (2))value: len=5 [41 6c 69 63 65]

Raw: "Alice"

Field 3 (string email)19 bytes

tag: 0x1a (wire_type=length-delimited (2))value: len=17 [61 6c 69 63 65 40 65 78 61 6d 70 6c 65 2e 63 6f 6d]

Raw: "alice@example.com"

id field — tag (0x08) + varint value

name field — tag (0x12) + length + UTF-8 bytes

email field — tag (0x1a) + length + UTF-8 bytes

Schema Evolution: Adding Fields Without Breaking Everything

The superpower of protobuf is backward and forward compatibility. You can add fields, remove fields, and change types — as long as you follow the rules:

Never reuse a field number: When you delete a field, do not reuse its number. Mark it as reserved.
New fields are optional: Old code skips fields it does not recognize (unknown fields are preserved).
Wire-type compatible changes: You can change int32 to int64 or uint32 (same wire type), but not int32 to string (different wire type).

message User {
  reserved 6, 10 to 15;  // These field numbers can never be used

  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string roles = 5;

  // New fields added later:
  string phone = 16;       // Safe: new number
  Address address = 17;    // Safe: new number
}

A client built with the old schema (without phone and address) can still deserialize a response from a new server — it simply ignores the unknown fields. A new client deserializing an old response gets "" for phone and a default Address for address.

This is dramatically better than JSON, where adding a field is technically easy but every client must handle the missing field explicitly.

HTTP/2: The Transport Layer

gRPC uses HTTP/2 as its transport. HTTP/2 provides features that HTTP/1.1 cannot:

Multiplexed streams: Multiple RPCs share a single TCP connection. No head-of-line blocking.
Binary framing: Frames (HEADERS, DATA, SETTINGS, etc.) are binary, not text. Efficient parsing.
Flow control: Per-stream flow control prevents a slow consumer from being overwhelmed.
Header compression (HPACK): Repeated headers (like content-type: application/grpc) are compressed to a few bytes.
Server push: Server can push multiple responses (for streaming) without a new request per response.

The gRPC protocol maps onto HTTP/2 like this:

Each RPC creates a new HTTP/2 stream (stream ID incremented by 1)
Client sends HEADERS frame with :method = POST, :path = /package.Service/Method, content-type = application/grpc
Client sends DATA frame(s) with the serialized protobuf request
Server sends HEADERS frame with :status = 200, grpc-status = 0
Server sends DATA frame(s) with the serialized protobuf response
Stream closes with END_STREAM flag

For streaming RPCs, multiple DATA frames flow in one or both directions on the same stream. The stream stays open until both sides signal completion.

Unary RPC: The Classic Request-Response Pattern

Unary RPC is the simplest pattern: the client sends exactly one request message, and the server sends exactly one response message. It maps directly to how HTTP/1.1 works, but with HTTP/2 multiplexing and protobuf efficiency.

The key gRPC-unique behavior happens in the framing:

The request is prefixed with a 5-byte header: 1 byte for compression flag (0 = none), 4 bytes for message length (big-endian)
The response follows the same framing
A grpc-status trailer header is sent to signal success (0 = OK) or error (non-zero)

// Unary RPC call
const request = { userId: 42 }
client.getUser(request, { deadline: Date.now() + 5000 }, (err, response) => {
  if (err) {
    console.error('gRPC error:', err.code, err.details)
    return
  }
  console.log('User:', response.name)
})

Under the hood, this triggers the frame exchange shown in the demo below.

Unary RPC: HTTP/2 Frame Exchange

gRPC Client

UserService

HTTP/2

gRPC Server

UserService

Press "Send Request" to watch the HTTP/2 frame exchange

HEADERS

DATA

Processing

HEADERS

DATA

Server-Streaming RPC: Push Data from Server to Client

Server-streaming RPC is one of the patterns that gRPC makes easy but REST struggles with. The client sends a single request, and the server sends a stream of responses over time.

This is perfect for:

Listing a large dataset (paginated without pagination logic)
Real-time updates (the server sends new events as they happen)
Progress notifications during long-running operations

// Server-streaming RPC call
const call = client.listUsers({ role: 'admin' })
call.on('data', (user) => {
  console.log('Received user:', user.name)
})
call.on('end', () => {
  console.log('All users received')
})
call.on('error', (err) => {
  console.error('Stream error:', err)
})

The server writes multiple response messages on the same HTTP/2 stream. Each message has the standard 5-byte frame header. The stream stays open until the server sends a grpc-status trailer.

Client-Streaming RPC: Batch Uploads from Client to Server

Client-streaming RPC reverses the pattern: the client sends multiple messages, and the server sends a single response after receiving all of them.

This is useful for:

File uploads (sending chunks)
Batch processing (sending records for bulk import)
Aggregation (sending data points for a summary)

// Client-streaming RPC call
const call = client.createUser((error, response) => {
  if (error) {
    console.error('Upload failed:', error)
    return
  }
  console.log('Batch created:', response.count, 'users')
})

call.write({ name: 'Alice', email: 'alice@x.com' })
call.write({ name: 'Bob', email: 'bob@x.com' })
call.write({ name: 'Carol', email: 'carol@x.com' })
call.end()

The client sends DATA frames for each message. The server accumulates them or processes them incrementally. When the client calls end(), the server sends its single response.

Bidirectional Streaming RPC: Real-Time Two-Way Communication

Bidirectional streaming allows both client and server to send messages independently on the same stream. Unlike server-streaming (where the server responds to a single request) or client-streaming (where the client sends before the server), bidirectional streaming has no ordering constraints.

Messages flow asynchronously. The client can send 5 messages, then the server sends 3. Or the client sends 1, the server sends 1, the client sends 2. Any pattern works.

This is the foundation for:

Chat applications
Real-time collaboration (Google Docs-style)
Game state synchronization
Publish/subscribe with bidirectional negotiation

// Bidirectional streaming RPC call
const call = client.chat()

call.on('data', (message) => {
  console.log('Server says:', message.text)
})

call.write({ text: 'hello', userId: 1 })
call.write({ text: 'how are you?', userId: 1 })

// Meanwhile, the server can send messages whenever it wants

The key insight: bidirectional streaming in gRPC does NOT require the client and server to take turns. The stream is full-duplex on top of HTTP/2’s multiplexed framing.

Four RPC Types

Single request, single response. Classic request-reply.

Client

stub

HTTP/2 stream

Server

handler

Stream Details

Press "Run" to animate the unary flow

Deadlines and Timeouts

Every gRPC call should have a deadline (client-side timeout). If the server does not respond within the deadline, the client cancels the request and receives a DEADLINE_EXCEEDED error.

// Set a deadline of 5 seconds from now
const deadline = new Date()
deadline.setSeconds(deadline.getSeconds() + 5)

client.getUser(request, { deadline }, (err, response) => {
  if (err && err.code === grpc.status.DEADLINE_EXCEEDED) {
    console.error('Request timed out')
  }
})

Without a deadline, a gRPC client could wait forever if the server hangs. Deadlines propagate through the call chain — if service A calls B with a 5-second deadline, and B calls C, the remaining time is automatically propagated to C. This is called deadline propagation and prevents cascading failures.

// Server-side: check remaining time
server.on('getUser', (call, callback) => {
  const remaining = call.getDeadline() - Date.now()
  if (remaining < 500) {
    callback({ code: grpc.status.DEADLINE_EXCEEDED, details: 'Not enough time to process' })
    return
  }
  // Process normally
})

Interceptors: Middleware for Your gRPC Calls

Interceptors are the gRPC equivalent of middleware in web frameworks. They wrap every RPC call with cross-cutting behavior, without modifying the business logic.

Client interceptors run on the client side, wrapping outgoing calls:

Logging (log method name, duration, status)
Authentication (attach tokens to metadata)
Retry logic (retry on transient failures)
Tracing (propagate distributed tracing headers)

Server interceptors run on the server side, wrapping incoming calls:

Authentication (validate tokens before the handler runs)
Rate limiting (check request quotas)
Metrics (count requests, measure latency)
Request validation (validate fields before the handler)

// Server-side interceptor (pseudocode)
function loggingInterceptor(ctx, next) {
  const start = Date.now()
  console.log(`[gRPC] -> ${ctx.method}`)

  return next(ctx).then(response => {
    const duration = Date.now() - start
    console.log(`[gRPC] <- ${ctx.method} (${duration}ms)`)
    return response
  })
}

Interceptors compose like a chain: the first interceptor wraps the next, which wraps the next, until the actual handler runs. The response flows back through the chain in reverse order. This pattern is called the middleware chain or pipeline pattern.

Interceptor Chain

Toggle interceptors on or off. When you send a request, they execute in order: client-side first (top to bottom), then the gRPC call, then server-side (top to bottom).

Client Interceptors

Logging1st

Auth Token2nd

Retry3th

Server Interceptors

Auth Verify1st

Metrics2nd

Rate Limit3th

Interceptor execution order:

Logging>

Auth Token>

Retry>

Auth Verify>

Metrics>

Rate Limit

gRPC vs REST: When to Use Which

gRPC and REST solve the same problem — client-server communication — but with radically different trade-offs.

| Dimension | gRPC | REST | |-----------|------|------| | Serialization | Binary (protobuf) | Text (JSON/XML) | | Schema | Required (.proto file) | Optional (OpenAPI is separate) | | Streaming | Native (4 types) | Polling or SSE (workarounds) | | Browser support | gRPC-Web (limited) | Native (fetch, XMLHttpRequest) | | Human readability | No (binary) | Yes (JSON) | | Caching | No (POST only) | Yes (GET caching) | | Code generation | Built-in | External tools (OpenAPI Generator) | | Performance | 5-10x faster serialization | Slower, larger payloads |

gRPC excels in two scenarios:

Internal microservices communication: Low latency, high throughput, polyglot teams, streaming requirements. gRPC is the default choice for service-to-service calls.
Real-time data flows: Chat, live dashboards, event streams, collaborative editing. REST needs SSE or WebSocket for these.

REST still wins for:

Public APIs: Universal browser support, human debuggable, every HTTP client can call them.
Simple CRUD: If you just need create/read/update/delete, REST is simpler and well-understood.
Caching-heavy systems: HTTP caching (ETags, Cache-Control) is powerful and built into every proxy and CDN.

gRPC vs REST

gRPCUnary

{"id": 42}

Payload

42 B

Latency:2 ms

RESTGET /users/:id

{"id": 42, "name": "Alice", "email": "alice@example.com"}

Payload

167 B

Latency:4 ms

Feature Comparison

Feature

gRPC

REST

Schema enforcement

Code generation

Native streaming

Browser support

Human readable

Caching (HTTP cache)

Strong typing

Payload compression

When to Choose

gRPC wins:
Microservices, real-time streaming, low-latency internal APIs, polyglot environments

REST wins:
Browser clients, public APIs, simple CRUD, caching-critical systems, debugging ease

gRPC-Web: RPC in the Browser

Browsers cannot send raw HTTP/2 frames or use the gRPC trailers mechanism. gRPC-Web bridges this gap:

The browser uses a gRPC-Web-compatible client library
Requests go through a proxy (like Envoy) that translates gRPC-Web to standard gRPC
The proxy converts trailers into a response body chunk (trailers as a base64-encoded block)

// gRPC-Web client (browser)
import { GrpcWebClient } from 'grpc-web'

const client = new UserServiceClient('https://api.example.com')
client.getUser({ id: 42 }, (err, response) => {
  // Works the same as the standard gRPC client
})

gRPC-Web has limitations:

No bidirectional streaming (only unary and server-streaming)
No trailer-based metadata
Proxies add latency

For internal browser applications that need streaming, WebSocket or SSE are often better choices than gRPC-Web.

Load Balancing with gRPC

Load balancing gRPC is different from HTTP load balancing because gRPC connections are long-lived (HTTP/2 persistent connections) and do not use the typical request-per-connection model.

Client-side load balancing: The gRPC client maintains a list of server addresses and distributes calls across them. Popular strategies:

Round robin: Distribute calls evenly across servers
Pick first: Try servers in order until one succeeds
Weighted round robin: Distribute based on server capacity
Least load: Send to the server with the fewest active streams

// Client-side load balancing with round robin
const client = new UserServiceClient('dns:///api.example.com:50051',
  grpc.credentials.createInsecure(),
  { 'grpc.lb_policy_name': 'round_robin' }
)

Proxy-based load balancing: An L7 proxy (Envoy, Linkerd, NGINX) terminates the gRPC connection and distributes individual RPCs to backend servers. This is simpler and works with any language, but adds latency.

The key challenge: gRPC clients open a single HTTP/2 connection and multiplex many RPCs over it. If you do round-robin at the TCP level (L4), all RPCs go to the same server (the one connected). You must load-balance at the RPC level, not the connection level.

Reflection and Health Checking

gRPC reflection allows clients to discover services and methods at runtime without the .proto file. This is essential for tools like grpcurl, grpc_cli, and debugging consoles.

# grpcurl with reflection
grpcurl -plaintext localhost:50051 list
# Output:
# grpc.health.v1.Health
# UserService

grpcurl -plaintext localhost:50051 describe UserService.GetUser
# Output:
# UserService.GetUser is a unary RPC
#   Input: GetUserRequest
#   Output: User

grpcurl -plaintext -d '{"id": 42}' localhost:50051 UserService.GetUser
# Output:
# {
#   "id": 42,
#   "name": "Alice",
#   "email": "alice@example.com"
# }

Enable reflection on your server:

import "grpc/reflection/v1/reflection.proto";
// Register the reflection service in your server code

gRPC health checking uses a standard protocol (defined in grpc.health.v1.Health) to report service health. Kubernetes, Envoy, and other orchestrators use this to determine if a service is ready to receive traffic.

grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check
# Output:
# {
#   "status": "SERVING"
# }

A service reports one of three statuses:

SERVING: Ready to handle requests
NOT_SERVING: Alive but not accepting requests (e.g., warming up, draining)
SERVICE_UNKNOWN: The health check service is not registered

Health checking and reflection together form the operational foundation for running gRPC services at scale.

Summary

gRPC is a modern, high-performance RPC framework that combines Protocol Buffers for serialization with HTTP/2 for transport. Its four streaming patterns (unary, server-streaming, client-streaming, bidirectional) cover every communication pattern a distributed system needs, from simple request-reply to full-duplex real-time messaging.

The key takeaways:

Protocol Buffers give you a strongly-typed contract with efficient binary serialization and safe schema evolution
HTTP/2 provides multiplexing, flow control, and bidirectional streaming across a single TCP connection
Code generation eliminates boilerplate and ensures client-server compatibility
Interceptors provide clean separation of cross-cutting concerns from business logic
gRPC is not a REST replacement — it is a complementary tool optimized for internal services and streaming workloads

When you need to move data between services efficiently, with strong contracts and native streaming, gRPC is the tool that gets out of your way and lets you focus on what matters: the business logic.

Test Your Knowledge

Question 1 of 610 pts

What are the four streaming patterns supported by gRPC?

Score: 0 / 750%