Testing Deep Dive: Unit, Integration, E2E, and How to Test Effectively

· testingunit-testintegratione2etddquality

Why Test?

Imagine you are a structural engineer designing a bridge. Before anyone crosses it, you run load simulations, test the materials, and inspect every weld. You would never tell people “we will check if the bridge holds once cars drive on it.”

Software is the same. Every line of code is a load-bearing beam. Every change is a new stress test. Without automated tests, you are sending users across a bridge that has never been inspected.

Testing is not a luxury. It is the engineering discipline that separates professional software from prototypes. A well-tested codebase is a codebase you can change with confidence. An untested codebase is a ticking time bomb.

The Real Cost of Not Testing

The argument against testing usually sounds like “we do not have time to write tests.” But consider what happens without them:

  • Every deploy is a high-anxiety event. You hold your breath and hope nothing breaks.
  • Regression bugs creep in silently. A change to one part of the system breaks a distant, seemingly unrelated feature.
  • Onboarding new developers takes weeks longer because the only documentation is the production code itself.
  • Refactoring is terrifying. Renaming a method, extracting a service, or upgrading a dependency feels like defusing a bomb.

The time you save by not writing tests is borrowed from future debugging sessions. And the interest rate is brutal.

Testing Goals

Every test you write serves one or more of three core goals:

Regression prevention. When you add a new feature, existing features should keep working. A test suite acts as a safety net: change something, run the tests, and know immediately if you broke anything. Regression tests are the most common type of test and form the bulk of any test suite.

Confidence. Tests give you the courage to change code. Without tests, a simple refactor becomes “do not touch it, it works.” With tests, you can restructure, optimize, and redesign with the certainty that if something breaks, you will know exactly what and where.

Documentation. Tests are executable documentation. They describe how the code is supposed to behave in a precise, unambiguous way. A new developer can read the test suite and understand the system faster than reading the implementation. Unlike comments, tests cannot go out of date — if the test passes, the documentation is still accurate.

Every test should serve at least one of these goals. If a test does not prevent regressions, build confidence, or document behavior, it is adding noise, not value.

The Testing Pyramid vs Trophy

Two mental models dominate how teams structure their test suites: the classic testing pyramid and the modern testing trophy.

The testing pyramid (coined by Mike Cohn) says: write lots of fast, isolated unit tests at the base, fewer integration tests in the middle, and very few slow end-to-end tests at the top. The shape matters. A wide base of unit tests gives you fast feedback. A narrow top of E2E tests covers critical paths without making your suite painfully slow.

The testing trophy (popularized by Kent C. Dodds) adds static analysis as a new layer. Before you even run a test, your type checker and linter have already caught entire categories of bugs. The trophy rebalances proportions: static analysis handles the most checks, unit tests handle business logic, integration tests verify component interaction, and E2E tests cover only the most critical user journeys.

Which model is right? Both. The pyramid is a better starting point for languages with weaker type systems (Python, Ruby, JavaScript without TypeScript). The trophy becomes more relevant when you have strong static typing (TypeScript, Rust, Haskell) and want to let the compiler do more of the work.

Testing Pyramid vs Testing Trophy

The classic pyramid recommends many unit tests, fewer integration tests, and few E2E tests. The modern trophy adds static analysis and shifts proportions.

Understanding Test Types

Let us look at each test type in detail so you know exactly what belongs where.

Unit Tests

Unit tests verify one piece of behavior in isolation. They test a single function, method, or class with all external dependencies replaced by test doubles. The key properties of a unit test are speed and isolation.

// Jest/Vitest
import { describe, it, expect } from 'vitest'
import { calculateDiscount } from './pricing'

describe('calculateDiscount', () => {
  it('applies 20% premium discount', () => {
    expect(calculateDiscount(100, true)).toBe(80)
  })

  it('applies no discount for non-premium', () => {
    expect(calculateDiscount(100, false)).toBe(100)
  })

  it('caps discount at 50%', () => {
    expect(calculateDiscount(1000, true, 'MEGA')).toBe(500)
  })
})
# pytest
from pricing import calculate_discount

def test_premium_discount():
    assert calculate_discount(100, True) == 80

def test_no_discount():
    assert calculate_discount(100, False) == 100

def test_max_discount_cap():
    assert calculate_discount(1000, True, "MEGA") == 500

Unit tests are the workhorses of your test suite. They run in microseconds, are easy to debug, and pinpoint failures to exact lines of code. If a unit test fails, you know exactly which function is broken.

Integration Tests

Integration tests verify that multiple components work together. They exercise real dependencies (databases, APIs, file systems) but skip the browser. They are slower than unit tests but catch bugs that unit tests miss — things like incorrect SQL queries, mismatched API contracts, and misconfigured middleware.

// Integration test with supertest
import { describe, it, expect } from 'vitest'
import request from 'supertest'
import { app } from './app'
import { setupTestDb, teardownTestDb } from './test-helpers'

describe('POST /api/users', () => {
  beforeAll(async () => {
    await setupTestDb()
  })

  afterAll(async () => {
    await teardownTestDb()
  })

  it('creates a user and returns the profile', async () => {
    const res = await request(app)
      .post('/api/users')
      .send({ email: 'test@example.com', name: 'Test User' })

    expect(res.status).toBe(201)
    expect(res.body.email).toBe('test@example.com')
    expect(res.body.id).toBeDefined()
  })

  it('rejects duplicate emails', async () => {
    await request(app)
      .post('/api/users')
      .send({ email: 'dup@example.com', name: 'First' })

    const res = await request(app)
      .post('/api/users')
      .send({ email: 'dup@example.com', name: 'Second' })

    expect(res.status).toBe(409)
  })
})

Integration tests live in the middle of your pyramid. Not as numerous as unit tests, but far more important for catching real-world bugs. An application with perfect unit tests and no integration tests will still break in production because unit tests cannot verify that the database query you wrote actually returns the right rows.

End-to-End Tests

E2E tests run your full application in a real browser. They click buttons, fill forms, navigate pages, and verify that everything works from the user’s perspective. These are the slowest and most expensive tests to write and maintain.

// Playwright
import { test, expect } from '@playwright/test'

test('user can sign up and complete onboarding', async ({ page }) => {
  await page.goto('/signup')
  await page.fill('[name="email"]', 'newuser@example.com')
  await page.fill('[name="password"]', 'secure123')
  await page.click('button[type="submit"]')
  await expect(page.locator('.welcome-message')).toHaveText('Welcome!')

  await page.click('text=Start Onboarding')
  await page.fill('[name="company"]', 'Acme Corp')
  await page.click('text=Finish')
  await expect(page).toHaveURL('/dashboard')
})

Keep E2E tests focused on critical business flows: sign-up, checkout, payment, core feature walkthroughs. Do not write an E2E test for every edge case — that is what unit and integration tests are for. A good rule of thumb: if a bug in this flow would cause a revenue-impacting incident, write an E2E test for it.

Test Doubles

Real dependencies are slow, unreliable, and hard to control in tests. A database might have stale data. An external API might be down. A file system might have permission issues. Test doubles replace these real dependencies with controlled substitutes.

There are four main types of test doubles, each suited for different scenarios:

Stubs return canned responses. You tell the stub “when method X is called, return Y.” Stubs are the simplest double: they ensure your code gets the data it needs without actually hitting the real dependency. Use stubs when you just need a predictable return value and do not care whether the method was called.

Mocks are stubs with expectations. A mock not only returns canned data but also verifies that specific methods were called with specific arguments. If the expected call does not happen, the test fails. Use mocks when you need to verify that your code interacts with a dependency correctly — for example, that it calls sendEmail with the right recipient and message.

Fakes are lightweight working implementations. Instead of a real PostgreSQL database, you use an in-memory hash map that implements the same interface. Fakes have real behavior (they store and retrieve data) but no side effects (no network, no disk I/O). Use fakes when you need realistic behavior without the overhead of real infrastructure.

Spies wrap real objects and record all interactions. The real implementation runs normally, but the spy tracks which methods were called, with what arguments, and how many times. Use spies when you want the real behavior but also need to verify interactions after the fact.

Test Doubles

Service queries a Repository backed by a database. Toggle the double type to see how each behaves.

Service
Real DB
Database
User.find(42) -- actual SQL query to PostgreSQL
The actual database connection. Every call performs a real query. Slow, stateful, and can produce flaky tests when data leaks between test cases.
Sending a letter through the actual postal service. It works but takes time, costs money, and the letter might get lost.
Real database query
Execution Log
Click "Run Query" to see execution...

When to Use Each Double

The choice of double depends on what you are testing:

  • Stubs for read operations where you control the data. Your service queries a repository, you stub the response with known data, and you verify the service processes it correctly.
  • Mocks for write operations where you need to verify side effects. Your service creates a user, you mock the email service to verify that a welcome email was sent with the right arguments.
  • Fakes for complex dependencies where you need realistic behavior. Your service runs a complex query with filtering and sorting. An in-memory fake gives you real results without a real database.
  • Spies when you want to verify that existing behavior was invoked correctly without changing the behavior itself. For example, you want to verify that a caching layer was checked before hitting the database.

A common anti-pattern is over-mocking — replacing every dependency with a mock and ending up with tests that verify implementation details instead of behavior. If changing the internal structure of a class breaks tests without changing any observable behavior, you are testing the wrong thing.

Property-Based Testing

Traditional example-based tests check specific inputs against specific outputs. You write:

test('reverse([1, 2, 3]) returns [3, 2, 1]', () => {
  expect(reverse([1, 2, 3])).toEqual([3, 2, 1])
})

This tests exactly one case. If the function works for [1, 2, 3] but fails for [7, 2, 9], your test will not catch it.

Property-based testing flips this around. Instead of specifying concrete examples, you specify a property that should always hold true for all valid inputs. The testing library generates random inputs and checks the property against each one. If a failing input is found, the library shrinks it — it tries to find the smallest, simplest input that still triggers the failure.

import { describe, it } from 'vitest'
import fc from 'fast-check'

describe('reverse', () => {
  it('reverse(reverse(arr)) equals arr', () => {
    fc.assert(
      fc.property(fc.array(fc.integer()), (arr) => {
        expect(reverse(reverse(arr))).toEqual(arr)
      })
    )
  })
})
from hypothesis import given, strategies as st
from mylib import reverse

@given(st.lists(st.integers()))
def test_reverse_reverse_is_identity(arr):
    assert reverse(reverse(arr)) == arr

The library generates hundreds of random arrays. If a bug exists (like our buggyReverse function that mishandles [7, 2, 9]), one of those random arrays will trigger it. Then the library shrinks the failing input — it tries progressively smaller inputs until it finds the minimal repro. You get back the smallest possible failing case, making debugging trivial.

Property-Based Testing

Property: reverse(reverse(arr)) equals arr for all arrays. Example-based tests only cover what you write. Property-based tests generate random inputs, find edge cases, and shrink failures to minimal repro.

Properties to Test

Property-based testing works best for functions that have clear mathematical properties:

  • Idempotence: running the operation twice gives the same result as running it once (e.g., sorting, deduplication)
  • Invariance: some property of the output does not change (e.g., len(reverse(arr)) == len(arr))
  • Round-tripping: serialize then deserialize gives back the original (e.g., JSON encoding, encryption)
  • Metamorphic relations: changing the input in a predictable way changes the output predictably (e.g., adding an element to a list increases its length)

Property-based testing is not a replacement for example-based tests. It is a complement. Example-based tests document specific behaviors. Property-based tests find edge cases you did not think of.

Unit Testing Best Practices: FIRST Principles

Effective unit tests follow the FIRST principles: Fast, Isolated, Repeatable, Self-verifying, and Timely.

Fast. Tests should run in milliseconds. If your unit tests take seconds, something is wrong. Slow tests do not get run frequently, and infrequently run tests let bugs through. If a test requires a database connection, it is not a unit test.

# Fast test run
vitest run   # 142 tests in 1.2s

# If tests are slow, check:
# - Are you hitting real databases or APIs?
# - Are you instantiating heavy objects in beforeEach?
# - Are you running unnecessary setup for every test?

Isolated. Tests should not depend on each other or share mutable state. Each test sets up its own data, runs, and tears down. The order of test execution should not matter. Shared state is the number one cause of flaky tests.

// Bad: shared mutable state
let user
beforeEach(() => { user = new User() }) // reused across tests!

// Good: fresh state per test
it('sets email', () => {
  const user = new User()
  user.email = 'test@example.com'
  expect(user.email).toBe('test@example.com')
})

Repeatable. A test should produce the same result every time, regardless of environment. No dependencies on network availability, time of day, or random data. If a test fails sometimes and passes other times, it is worse than useless — it erodes trust in the entire suite.

Self-verifying. Each test has a boolean outcome: pass or fail. No manual interpretation, no log inspection, no visual confirmation. The assertion clearly states what is expected. If a test requires a human to look at output and decide if it is correct, it is not automated testing.

Timely. Write tests alongside the code they test. Tests written weeks after the code tend to be too vague (testing nothing meaningful) or too specific (testing implementation details). The best time to write a test is right before or right after you write the production code.

Integration Testing Strategies

Integration testing requires a different mindset from unit testing. You are no longer testing a single function in isolation. You are testing how your components work together with real infrastructure.

Strategy 1: Testcontainers

Testcontainers let you spin up real database instances in Docker containers for your tests. Unlike in-memory databases (SQLite in a unit test), Testcontainers run the exact same database engine you use in production. The container starts before your tests, the tests run against it, and the container is destroyed after.

// Using testcontainers with Vitest
import { PostgreSqlContainer } from '@testcontainers/postgresql'
import { Client } from 'pg'

let container
let client

beforeAll(async () => {
  container = await new PostgreSqlContainer().start()
  client = new Client({ connectionString: container.getConnectionUri() })
  await client.connect()
  await client.query(`
    CREATE TABLE users (
      id SERIAL PRIMARY KEY,
      email VARCHAR(255) UNIQUE NOT NULL,
      name VARCHAR(255)
    )
  `)
}, 30000) // 30s timeout for container startup

afterAll(async () => {
  await client.end()
  await container.stop()
})

it('inserts and retrieves a user', async () => {
  await client.query(
    'INSERT INTO users (email, name) VALUES ($1, $2)',
    ['test@example.com', 'Test']
  )
  const res = await client.query(
    'SELECT * FROM users WHERE email = $1',
    ['test@example.com']
  )
  expect(res.rows[0].name).toBe('Test')
})

Strategy 2: API Contract Tests

Contract tests verify that your API behaves according to its specification. They check status codes, response shapes, header values, and error formats. These tests catch the most common integration bugs: changes to response format that break frontend consumers.

describe('GET /api/users/:id', () => {
  it('returns 404 for non-existent user', async () => {
    const res = await request(app).get('/api/users/99999')
    expect(res.status).toBe(404)
    expect(res.body).toHaveProperty('error', 'not_found')
  })

  it('returns user with expected shape', async () => {
    const res = await request(app).get('/api/users/1')
    expect(res.status).toBe(200)
    expect(res.body).toMatchObject({
      id: expect.any(Number),
      email: expect.any(String),
      name: expect.any(String),
      createdAt: expect.any(String),
    })
    expect(res.body).not.toHaveProperty('passwordHash')
  })
})

Strategy 3: Database Migration Tests

Database schema changes are one of the riskiest operations in any application. A migration test verifies that your migrations run successfully on a real database and that the resulting schema matches your expectations.

describe('database migrations', () => {
  it('applies all migrations without errors', async () => {
    const result = await runMigrations()
    expect(result.success).toBe(true)
  })

  it('users table has expected columns', async () => {
    const res = await client.query(`
      SELECT column_name, data_type
      FROM information_schema.columns
      WHERE table_name = 'users'
    `)
    const columns = res.rows.map(r => r.column_name)
    expect(columns).toContain('id')
    expect(columns).toContain('email')
    expect(columns).toContain('name')
  })
})

End-to-End Testing with Playwright

Playwright is the modern standard for E2E testing. It supports Chromium, Firefox, and WebKit with a single API. It auto-waits for elements, generates selectors, and records test traces for debugging failures.

import { test, expect } from '@playwright/test'

test.describe('shopping cart', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/products')
  })

  test('adds item to cart and proceeds to checkout', async ({ page }) => {
    await page.click('[data-testid="product-1"] .add-to-cart')
    await expect(page.locator('.cart-count')).toHaveText('1')

    await page.click('.cart-icon')
    await expect(page.locator('.cart-item')).toHaveCount(1)

    await page.click('text=Proceed to Checkout')
    await expect(page).toHaveURL(/\/checkout/)
    await expect(page.locator('.order-summary')).toBeVisible()
  })

  test('shows empty cart message', async ({ page }) => {
    await page.click('.cart-icon')
    await expect(page.locator('.empty-cart')).toBeVisible()
    await expect(page.locator('.empty-cart')).toHaveText(
      'Your cart is empty'
    )
  })
})

Playwright runs tests in parallel by default, significantly reducing test suite time. Each test gets its own browser context with isolated storage, cookies, and sessions. This means tests never leak state to each other.

Page Object Model

For maintainable E2E tests, use the Page Object Model. Each page or component gets a class that encapsulates selectors and actions:

class LoginPage {
  constructor(private page: Page) {}

  async goto() {
    await this.page.goto('/login')
  }

  async login(email: string, password: string) {
    await this.page.fill('[name="email"]', email)
    await this.page.fill('[name="password"]', password)
    await this.page.click('button[type="submit"]')
  }

  async errorMessage() {
    return this.page.locator('.error-message').textContent()
  }
}

test('shows error for invalid credentials', async ({ page }) => {
  const loginPage = new LoginPage(page)
  await loginPage.goto()
  await loginPage.login('wrong@example.com', 'badpassword')
  await expect(loginPage.errorMessage()).resolves.toContain(
    'Invalid email or password'
  )
})

Code Coverage and Mutation Testing

Code coverage measures which lines of your code are executed during tests. It is a useful diagnostic tool, but it is dangerously easy to misinterpret.

Line coverage tells you that a line of code ran. It does not tell you that the test verified the correct behavior on that line. You can have 100% line coverage with tests that assert nothing meaningful — a phenomenon known as the coverage mirage.

Branch coverage is more meaningful. It tells you whether each branch of a conditional (if/else, switch cases) was executed. If you have an if (isPremium) block, branch coverage tells you whether both the true and false paths were tested.

Mutation testing takes this further. It introduces small changes to your code (mutations) and checks whether your tests detect them. For example, it changes > to >= or flips true to false in a conditional. If your tests still pass after the mutation, they are not actually testing that code — the mutation survived.

// Original code
function calculateDiscount(price, isPremium, coupon) {
  let discount = 0
  if (isPremium) discount += 0.2
  // ...
}

// A mutation tool would test:
// mutation: change `if (isPremium)` to `if (!isPremium)`
// If tests still pass, your test does not actually verify the premium discount
Code Coverage Analysis

Coverage tells you which code ran during tests -- but not whether the tests actually verify correctness. Toggle between scenarios to see the difference.

100%
Line Coverage
75%
Branch Coverage
30%
Mutation Score
All lines are executed, but assertions are weak. Branches are hit but not verified independently. Mutation testing reveals the tests are fragile.
calculateDiscount.ts
1
function calculateDiscount(price: number, isPremium: boolean, coupon?: string): number {
2
let discount = 0;
3
4
if (isPremium) {
5
discount += 0.2;
branch OK
6
}
branch OK
7
8
if (coupon) {
branch OK
9
discount += 0.1;
branch OK
10
}
11
branch MISS
12
if (price < 100) {
branch MISS
13
discount = Math.min(discount, 0.1);
14
}
branch MISS
15
branch MISS
16
if (discount > 0.5) {
17
discount = 0.5;
18
}
19
20
return price * (1 - discount);
21
}

Coverage Goals Are Traps

Setting a coverage target like “80% line coverage” is counterproductive. Teams game the metric by writing shallow tests that execute lines without verifying behavior. The result is a green coverage report and a suite of tests that catch no bugs.

Instead of chasing coverage percentages, ask yourself:

  • Would these tests catch a real bug in this code?
  • If I delete this test, would anyone notice?
  • Are we testing behavior or implementation details?

A small suite of meaningful tests is worth more than a large suite of coverage-padding tests.

Test Coverage vs Test Effectiveness

Let us dig deeper into the distinction between coverage and effectiveness because it is the single most misunderstood concept in testing.

Coverage is about quantity. It measures how much of your code executed during a test run. Tools like Istanbul (JavaScript), coverage.py (Python), and JaCoCo (Java) report numbers: 85% line coverage, 72% branch coverage.

Effectiveness is about quality. It measures whether your tests would catch a bug if one were introduced. Mutation testing measures effectiveness directly. But you can also reason about it: a test for calculateDiscount that only checks the isPremium = false case has low effectiveness because it would not catch a bug in the premium discount logic.

The relationship between coverage and effectiveness follows the law of diminishing returns. The first 60% of coverage catches the most obvious bugs. Getting from 80% to 90% is harder. Getting from 90% to 95% is harder still. And the final 5% of coverage often costs more than it saves because the code is trivial (simple getters, logging statements, error handling that is hard to trigger).

When High Coverage Matters

Not all code is equally important. A function that handles credit card validation, medical record processing, or financial calculations deserves both high coverage and high effectiveness. A logging utility or a simple getter does not.

Apply the risk-based testing principle: the more impact a bug would have, the more thoroughly the code should be tested. A bug in the payment pipeline costs real money. A bug in the admin panel’s user count display is an inconvenience.

CI Pipeline and Flaky Test Management

Your tests are only valuable if they run consistently and give reliable signals. That means integrating them into a CI pipeline and systematically eliminating flaky tests.

A standard CI pipeline for a web application runs stages in a specific order. Fast checks run first to give quick feedback. Slower checks run after to verify deeper correctness. Parallel stages (like unit and integration tests) run simultaneously on different workers.

# GitHub Actions example
name: Test Pipeline
on: [pull_request]

jobs:
  lint-and-typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1
      - run: bun install
      - run: bunx tsc --noEmit
      - run: bunx eslint .

  test:
    needs: lint-and-typecheck
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1
      - run: bun install
      - run: bun vitest run

  e2e:
    needs: lint-and-typecheck
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1
      - run: bun install
      - run: bunx playwright install
      - run: bunx playwright test

  deploy:
    needs: [test, e2e]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - run: bun run build && bunx wrangler pages deploy

Lint and type check run first because they are fastest and catch the broadest category of errors. Unit and integration tests run in parallel because they are independent. E2E tests also run in parallel with unit tests since they require different infrastructure. Build and deploy only run if all tests pass.

CI Test Pipeline

A pull request triggers the pipeline. Independent test suites run in parallel. A failure at any stage stops the pipeline and prevents deployment.

Lint
waiting...
Type Check
waiting...
Unit Tests
waiting...
Integration Tests
waiting...
E2E Tests
waiting...
Build
waiting...
Deploy
waiting...

Flaky Test Management

A flaky test is one that passes and fails without any code changes. Flaky tests are poison for a test suite. They erode trust, waste developer time, and eventually lead to teams ignoring test failures entirely.

Common causes of flakiness:

Race conditions. Async code where the test does not wait long enough for a side effect to complete. Fix with proper await and wait strategies.

Shared mutable state. Tests that leak database records, files, or global variables. Fix with proper isolation per test.

Timing dependencies. Tests that depend on time (current date, timeouts, delays). Fix by injecting a fake clock.

Network dependencies. Tests that call external APIs that go down or return unexpected responses. Fix by using test doubles for external services.

Order-dependent tests. Tests that assume a specific execution order. Fix by making every test fully self-contained.

When a flaky test is identified, quarantine it immediately. Move it to a separate CI job that is allowed to fail. Fix the root cause before returning it to the main suite. Do not ignore flaky tests — they are symptoms of deeper problems in your test architecture.

Best Practices for CI Testing

  • Run the fastest tests first to give developers rapid feedback
  • Parallelize independent test suites to minimize total pipeline time
  • Set test timeouts to prevent hung tests from blocking the pipeline
  • Use test retries only as a temporary measure, never as a permanent solution for flaky tests
  • Generate and store test reports, screenshots, and traces for debugging failures
  • Fail the pipeline on test failures — a test suite that allows failures is a test suite that is ignored

Summary

Testing is not optional. It is the engineering practice that transforms software development from guesswork into a predictable process. The key takeaways:

  • Structure your test suite as a pyramid or trophy: many fast tests at the bottom, few slow tests at the top
  • Use test doubles (stubs, mocks, fakes, spies) to control dependencies and isolate the code under test
  • Follow FIRST principles for unit tests: Fast, Isolated, Repeatable, Self-verifying, Timely
  • Use Testcontainers and real databases for integration tests — in-memory substitutes hide real bugs
  • Keep E2E tests focused on critical business flows and use the Page Object Model for maintainability
  • Add property-based tests for functions with mathematical properties — they find edge cases you did not think of
  • Measure test effectiveness, not just coverage. 100% line coverage with weak assertions is worse than 70% coverage with strong assertions
  • Manage flaky tests systematically: quarantine, fix root cause, return to suite

The best test suite is the one that gives you confidence to ship. Start where you are, add tests for the most critical paths first, and build up from there. Testing is a practice, not a project. It never ends — and that is a good thing.