How to query PACER programmatically

PACER has no REST API. Programmatic access requires web scraping — authenticating a session, navigating court-specific HTML interfaces, parsing inconsistently structured output, and managing session tokens that expire. This guide explains how that works, what the real engineering challenges are, and when it makes more sense to use an abstraction layer instead.

Before you build: If your goal is monitoring active cases for changes, review PACER vs CourtListener vs DocketLayer first. Building a PACER integration from scratch is a significant engineering investment. Most agent workflows are better served by an abstraction layer.

How PACER access works

PACER is not a single system. It is a unified login portal that routes to 184 separate court-hosted CM/ECF instances, each with its own URL, its own session management, and subtle differences in HTML structure. When you authenticate with PACER centrally, you receive a session token that is accepted by individual courts. Each court-level query then goes to that court's specific CM/ECF host.

The architecture means there is no single endpoint to call. A query to the Southern District of New York goes to ecf.nysd.uscourts.gov. The same query for Delaware bankruptcy goes to ecf.deb.uscourts.gov. Each court has its own session state.

Authentication

PACER authentication happens via a central login endpoint. The request is a standard HTML form POST:

        # Step 1: POST credentials to PACER login

        POST https://pacer.login.uscourts.gov/csologin/login.jsf

        # Form fields:

        login   = your_pacer_username

        password = your_pacer_password

        loginType = PACER

        # On success: session cookie returned in Set-Cookie header

        # Cookie name: PacerSession

        # Must be included in all subsequent requests

The session token must be included in every subsequent request as a cookie. Sessions expire after periods of inactivity — typically 60 minutes — and must be refreshed. In a long-running normalization service, session management is an ongoing operational concern, not a one-time setup step.

Querying a docket

With an active session, you can query a court's docket sheet. The request goes to the specific court's CM/ECF host:

        # Step 2: Request docket sheet for a case

        GET https://ecf.nysd.uscourts.gov/cgi-bin/DktRpt.pl

        # Query parameters:

        case_id = 1:24-cv-01234

        date_range_type = Filed

        date_from = 2026-01-01

        date_to = 2026-04-15

        output_format = html

        # Response: HTML page with docket entries table

        # No JSON. No structured data. Parse the HTML.

The response is an HTML page. You must parse it with a library like BeautifulSoup (Python) or Cheerio (Node.js) to extract docket entry data. The table structure varies across courts.

Parsing the response

A basic Python example for extracting docket entries from the response HTML:

        import requests

        from bs4 import BeautifulSoup

        # Assuming session cookie is already obtained

        session = requests.Session()

        session.cookies.set('PacerSession', your_token)

        resp = session.get(

          'https://ecf.nysd.uscourts.gov/cgi-bin/DktRpt.pl',

          params={'case_id': '1:24-cv-01234'}

        )

        soup = BeautifulSoup(resp.text, 'html.parser')

        table = soup.find('table', class_='docket')

        # Table structure differs per court.

        # SDNY format != Delaware format != NDCA format.

        # You will write court-specific parsing logic.

The real engineering challenges

The code above is the easy part. The hard parts are everything around it.

Session management at scale

Sessions expire. Tokens must be refreshed before they lapse. In a monitoring service querying hundreds of cases across dozens of courts, session management becomes an always-on operational problem. A failed session mid-query returns a login redirect, not an error code — your parser must detect this and re-authenticate.

Court-specific HTML variations

CM/ECF is nominally standardized, but courts have customized it for decades. Docket table structures, field names, date formats, and document link patterns all vary. A parser that works for SDNY will have edge cases in Delaware, will break outright in some smaller districts. You need test coverage against real output from every court you intend to support.

Rate limits and PACER standing

PACER imposes rate limits on commercial accounts. Aggressive querying — particularly bulk docket sheet retrieval — risks account throttling or suspension. Your normalization service must implement backoff logic and manage query volume carefully. Losing PACER access mid-production is a serious incident with no fast recovery path.

Change detection

PACER has no native change detection. To know whether a case has new filings, you must retrieve the full docket sheet and compare it against your last snapshot. This means storing prior state, implementing comparison logic, and managing false positives from formatting changes that are not substantive updates.

Cost management

PACER charges $0.10 per page. A docket sheet for an active, heavily-litigated case can be many pages. Retrieving it repeatedly for change detection accumulates costs quickly. Efficient polling — checking only what is necessary, only as often as necessary — requires deliberate engineering.

PACER maintenance windows

PACER schedules maintenance windows that take courts offline temporarily. Your service must handle these gracefully — surfacing the expected downtime, queuing retries, and not alerting on failures that are expected system maintenance rather than genuine errors.

Normalization

Once you have parsed HTML from multiple courts, you face a normalization problem. The raw output from SDNY and Delaware represent the same logical events — a motion filed, an order entered — in different formats. Building a uniform schema across courts requires either manual mapping rules per court (fragile, hard to maintain) or an AI normalization layer that can interpret varying formats into a consistent structure.

DocketLayer uses Claude to handle this normalization step. For teams building their own PACER integration, an AI normalization layer is the practical path to coverage at scale — writing manual parsing rules for 184 courts is not feasible.

When to build vs when to abstract

Building directly against PACER makes sense when:

You need document retrieval — actual PDFs, not just docket metadata
You need courts not yet covered by any abstraction layer
You are building the abstraction layer yourself (i.e., you are DocketLayer)
Query volume is low enough that PACER fees and engineering overhead are acceptable

Using an abstraction layer makes sense when:

You need structured JSON without building a scraping and normalization stack
You need change detection without writing stateful comparison logic
Your agents need to pay per query programmatically rather than manage a PACER account
Engineering time has a higher cost than $0.99/query

DocketLayer handles authentication, session management, HTML parsing, normalization, and change detection internally. Your agent submits a case ID and a timestamp and receives structured JSON. The PACER complexity is invisible to the caller.