pikepdf

Read, write, repair, and transform PDFs in Python -- powered by qpdf.

pikepdf is based on qpdf, a mature, actively maintained C++ library for PDF manipulation and repair.

Python + qpdf = "py" + "qpdf" = "pyqpdf", which looks like a dyslexia test. Say it out loud, and it sounds like "pikepdf".

import pikepdf

# Open a PDF -- pikepdf (via qpdf) automatically repairs structural damage
with pikepdf.Pdf.open('input.pdf') as pdf:
    num_pages = len(pdf.pages)
    del pdf.pages[-1]
    pdf.save('output.pdf')

Installation

pip install pikepdf

Binary wheels are available for all common platforms -- Linux, macOS, and Windows on both x86-64 and ARM64/Apple Silicon. No compiler required.

For building from source, see installation. Commercial support is available.

What Can pikepdf Do?

Manipulate pages

Merge, split, rotate, and rearrange pages across PDFs.

from pikepdf import Pdf

# Merge multiple PDFs
with Pdf.new() as merged:
    for filename in ['first.pdf', 'second.pdf', 'third.pdf']:
        src = Pdf.open(filename)
        merged.pages.extend(src.pages)
    merged.save('merged.pdf')

# Rotate all pages in a document
with Pdf.open('input.pdf') as pdf:
    for page in pdf.pages:
        page.rotate(180, relative=True)
    pdf.save('rotated.pdf')

Edit metadata

Read and write XMP metadata and DocumentInfo, with automatic synchronization between the two.

import pikepdf

with pikepdf.open('report.pdf') as pdf:
    with pdf.open_metadata() as meta:
        meta['dc:title'] = 'Quarterly Report'
        meta['dc:creator'] = ['Author Name']
    pdf.save('updated.pdf')

Extract images

Extract images losslessly from PDFs -- without re-encoding JPEGs or other compressed formats.

from pikepdf import Pdf, PdfImage

with Pdf.open('document.pdf') as pdf:
    for page in pdf.pages:
        for name, raw_image in page.images.items():
            image = PdfImage(raw_image)
            image.extract_to(fileprefix='output')

Encrypt and decrypt

Open password-protected PDFs and save with encryption (AES-256, AES-128, or RC4).

import pikepdf

# Open an encrypted PDF
with pikepdf.open('protected.pdf', password='secret') as pdf:
    pdf.save('decrypted.pdf')

# Save with encryption
with pikepdf.open('input.pdf') as pdf:
    pdf.save('encrypted.pdf', encryption=pikepdf.Encryption(
        user='readpassword', owner='adminpassword'
    ))

# Remove encryption if user password is not set
with pikepdf.open('protected.pdf') as pdf:
    pdf.save('decrypted.pdf', encryption=False)

(Digital signature-based encryption is not currently supported.)

Linearize to improve browser performance

Create "fast web view" PDFs optimized for streaming delivery.

with pikepdf.open('input.pdf') as pdf:
    pdf.save('web_optimized.pdf', linearize=True)

Access PDF objects directly

Use a Pythonic API that mirrors the PDF specification -- dictionaries, arrays, streams, and names map directly to Python types.

from pikepdf import Pdf, Name

with Pdf.open('input.pdf') as pdf:
    page = pdf.pages[0]
    page.MediaBox               # e.g. [0, 0, 612, 792]
    page.Resources.XObject      # image and form XObjects on this page
    page.Rotate = 90            # set page rotation directly

Use qpdf's Job API

Access qpdf's full command-line capabilities programmatically from Python.

from pikepdf import Job

# Check a PDF for errors
Job(['pikepdf', '--check', 'document.pdf']).run()

# Or use qpdf's JSON job interface
Job({'inputFile': 'input.pdf', 'outputFile': 'output.pdf', 'linearize': ''}).run()

Key Features

Built on qpdf -- backed by a mature, battle-tested C++ PDF library
Automatic PDF repair -- silently fixes many types of PDF damage on open
PDF/A compliance -- modify PDFs without breaking PDF/A conformance
XMP metadata editing -- full read/write support for XMP and DocumentInfo
Encryption support -- open and save password-protected PDFs (AES-256, AES-128, RC4)
Linearization -- create "fast web view" PDFs for efficient streaming
Pythonic API -- dictionary-style access to PDF objects, list-style page access
Lossless image extraction -- extract and replace images without re-encoding
Content stream inspection -- parse and manipulate page content at the operator level
Object-level manipulation -- work directly with PDF objects per the specification
Jupyter integration -- render PDF and page previews inline in notebooks
Binary wheels everywhere -- pre-built for Linux, macOS, Windows (x86-64 and ARM64)
Liberal license -- MPL-2.0, compatible with most open and closed source projects

When to Use pikepdf

pikepdf is a great fit when you need to:

Repair, sanitize, or normalize damaged or malformed PDFs
Merge, split, rotate, crop, or rearrange pages
Edit PDF metadata (XMP, DocumentInfo) programmatically
Build tools or libraries that operate on existing PDFs
Preserve PDF/A or other standard compliance while modifying documents
Work with encrypted PDFs
Perform low-level PDF surgery (object and stream manipulation)
Optimize PDFs for web delivery (linearization)

pikepdf is probably not what you want if you need to:

Generate PDFs from HTML or templates -- consider weasyprint or reportlab
Render PDFs to images -- consider PyMuPDF or pypdfium2
Extract text or tables from PDFs -- consider pdfminer.six or pdfplumber

PDF Libraries in Python

Python has several PDF libraries, each with different strengths. pypdf is pure Python and well-suited for straightforward PDF tasks without compiled dependencies. pypdfium for permissively licensed PDF rendering. PyMuPDF offers comprehensive rendering and text extraction. pikepdf focuses on correctness, repair, and low-level manipulation through qpdf, under the permissive MPL-2.0 license.

Testimonials

I decided to try writing a quick Python program with pikepdf to automate [something] and it "just worked". --Jay Berkenbilt, creator of qpdf

"Thanks for creating a great pdf library, I tested out several and this is the one that was best able to work with whatever I threw at it." --@cfcurtis

Used By

OCRmyPDF uses pikepdf to graft OCR text layers onto existing PDFs, to examine the contents of input PDFs, and to optimize PDFs.
PDF Arranger is a small Python application that provides a graphical user interface to rotate, crop and rearrange PDFs.
PDFStitcher is a utility for stitching PDF pages into a single document (i.e. N-up or page imposition).

Documentation

Full documentation is available at pikepdf.readthedocs.io. For the latest changes, see the release notes.

Contributing

Contributions are welcome! If you'd like to make a contribution, see the Contributing Guidelines

License

pikepdf is licensed under the Mozilla Public License 2.0 license (MPL-2.0) that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license. MPL 2.0 permits you to combine the software with other work, including commercial and closed source software, but asks you to publish source-level modifications you make to pikepdf itself.

Some components of the project may be under other license agreements, as indicated in their SPDX license header or the REUSE.toml file.

Name		Name	Last commit message	Last commit date
Latest commit History 3,275 Commits
.github		.github
LICENSES		LICENSES
bin		bin
build-scripts		build-scripts
docs		docs
examples		examples
fuzzing		fuzzing
src		src
tests		tests
.clang-format		.clang-format
.git_archival.txt		.git_archival.txt
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
REUSE.toml		REUSE.toml
licenses-for-wheels.txt		licenses-for-wheels.txt
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pikepdf

Installation

What Can pikepdf Do?

Manipulate pages

Edit metadata

Extract images

Encrypt and decrypt

Linearize to improve browser performance

Access PDF objects directly

Use qpdf's Job API

Key Features

When to Use pikepdf

PDF Libraries in Python

Testimonials

Used By

Documentation

Contributing

License

About

Uh oh!

Releases 43

Sponsor this project

Uh oh!

Used by 7.8k

Contributors 63

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pikepdf

Installation

What Can pikepdf Do?

Manipulate pages

Edit metadata

Extract images

Encrypt and decrypt

Linearize to improve browser performance

Access PDF objects directly

Use qpdf's Job API

Key Features

When to Use pikepdf

PDF Libraries in Python

Testimonials

Used By

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 43

Sponsor this project

Uh oh!

Used by 7.8k

Contributors 63

Languages