PDF Vector

Blog
/

How to Extract PDF Invoices to JSON with PDF Vector

How to Extract PDF Invoices to JSON with PDF Vector

Learn how to automatically extract invoice data from PDF files and convert it to structured JSON using PDF Vector's Ask API.

August 27, 2025

4 min read

it's me

Duy Bui

Extract Invoice Data from PDF to JSON

Need to pull invoice data from PDFs into your system? This tutorial shows you exactly how to use PDF Vector’s Ask API to extract invoices and convert them to JSON format with TypeScript.

What You’ll Learn:

  • Extract invoice fields like vendor details, line items, and totals
  • Convert PDF data to structured JSON using custom schemas
  • Process both online PDFs and local files
  • Handle complex invoice formats with line items and taxes

PDF Vector’s Ask API understands invoice structures automatically – you just define what fields you want in JSON, and it extracts them from any invoice format. Let’s dive into the code.

Getting Started with PDF Vector

Step 1: Create Your PDF Vector Account

Sign up for a free account at PDF Vector Dashboard and get your API key. Free accounts include 100 credits – enough to test invoice extraction on 30+ pages.

Step 2: Install the TypeScript SDK

Add PDF Vector to your project with npm:

npm install pdfvector

Step 3: Initialize the Client

Set up the PDF Vector client with your API key:

import { PDFVector } from 'pdfvector';

const client = new PDFVector({ 
  apiKey: 'pdfvector_your_api_key_here' 
});

Extract Your First Invoice to JSON

Basic Invoice Extraction

Let’s start with a simple example that extracts the essential invoice fields:

import { PDFVector } from 'pdfvector';

const client = new PDFVector({ 
  apiKey: 'pdfvector_your_api_key_here' 
});

async function extractInvoice() {
  const result = await client.ask({
    url: 'https://example.com/invoice.pdf',
    prompt: 'Extract invoice information',
    mode: 'json',
    schema: {
      type: 'object',
      properties: {
        invoiceNumber: { type: 'string' },
        invoiceDate: { type: 'string' },
        vendorName: { type: 'string' },
        customerName: { type: 'string' },
        totalAmount: { type: 'number' },
        currency: { type: 'string' }
      }
    }
  });

  console.log('Extracted Invoice:', result.json);
  return result.json;
}

// Run the extraction
extractInvoice().then(data => {
  console.log(`Invoice ${data.invoiceNumber} processed`);
  console.log(`Total: ${data.currency} ${data.totalAmount}`);
});

You can also process local PDF files by reading them as buffers:

import * as fs from 'fs/promises';

// For local files, read the file buffer first
const fileBuffer = await fs.readFile('./invoices/invoice.pdf');

const result = await client.ask({
  data: fileBuffer,
  contentType: 'application/pdf',
  prompt: 'Extract invoice information',
  mode: 'json',
  schema: {
    // ... same schema as above
  }
});

The JSON Output

Here’s what you’ll receive from PDF Vector:

{
  "invoiceNumber": "INV-2025-1234",
  "invoiceDate": "2025-01-15",
  "vendorName": "Acme Corporation",
  "customerName": "Tech Solutions Inc.",
  "totalAmount": 2499.99,
  "currency": "USD"
}

Clean, structured data ready for your database or accounting system!

Complete Invoice Schema with Line Items

Extracting Full Invoice Details

For complete invoice processing, use a comprehensive schema that captures all details including line items:

import { PDFVector } from 'pdfvector';

const client = new PDFVector({ 
  apiKey: 'pdfvector_your_api_key_here' 
});

async function extractCompleteInvoice() {
  const result = await client.ask({
    url: 'https://example.com/invoice.pdf',
    prompt: 'Extract complete invoice with all line items and details',
    mode: 'json',
    schema: {
      type: 'object',
      properties: {
        // Invoice identifiers
        invoiceNumber: { type: 'string' },
        invoiceDate: { type: 'string' },
        dueDate: { type: 'string' },
        
        // Vendor information
        vendor: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            address: { type: 'string' },
            taxId: { type: 'string' }
          }
        },
        
        // Customer information
        customer: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            address: { type: 'string' }
          }
        },
        
        // Line items
        lineItems: {
          type: 'array',
          items: {
            type: 'object',
            properties: {
              description: { type: 'string' },
              quantity: { type: 'number' },
              unitPrice: { type: 'number' },
              amount: { type: 'number' }
            }
          }
        },
        
        // Totals
        subtotal: { type: 'number' },
        taxAmount: { type: 'number' },
        totalAmount: { type: 'number' },
        currency: { type: 'string' }
      }
    }
  });

  return result.json;
}

// Process and display the results
extractCompleteInvoice().then(invoice => {
  console.log('=== Invoice Processed ===');
  console.log(`Invoice #: ${invoice.invoiceNumber}`);
  console.log(`Vendor: ${invoice.vendor.name}`);
  console.log(`Customer: ${invoice.customer.name}`);
  
  console.log('\n=== Line Items ===');
  invoice.lineItems.forEach(item => {
    console.log(`- ${item.description}: ${item.quantity} x $${item.unitPrice} = $${item.amount}`);
  });
  
  console.log('\n=== Totals ===');
  console.log(`Subtotal: $${invoice.subtotal}`);
  console.log(`Tax: $${invoice.taxAmount}`);
  console.log(`Total: $${invoice.totalAmount}`);
});

The AI-powered extraction automatically identifies and structures all invoice components, from vendor details to individual line items, ensuring nothing is missed.

Credit Usage for Ask API

The Ask API uses 3 credits per page of your PDF document.

Examples:

  • 1-page invoice = 3 credits
  • 2-page invoice = 6 credits
  • 10-page document = 30 credits

Free accounts get 100 credits monthly, which lets you process around 33 single-page invoices or 16 two-page invoices for testing.

Next Steps

You now have everything you need to extract invoice data from PDFs using PDF Vector’s Ask API. The code examples above work with any invoice format – just adjust the schema to match your specific needs.

Get your API key from PDF Vector to start extracting invoices.

Last updated on August 27, 2025

Browse all blog