How to import external data into Beancount?

Published

Importing external data into Beancount is often a source of confusion for folks new to Beancount. In this article we'll give you a high-level overview of how you can import external data (e.g. your bank statements including PDF documents or CSV exports) into your Beancount ledger. Our aim is to get you up and running as quickly as possible.

How Importing Works in Beancount

Beancount itself doesn't directly parse CSV files or bank statements. Instead, you write importers, which are Python classes that:

  1. Read a file from your bank (CSV, PDF, OFX, etc.)
  2. Extract transaction data and account information
  3. Output Beancount entries in the correct format

In Beancount 3, the maintainers decided to leave the decision of how to import data up to the user, which means there are multiple ways of setting it up. That being said, the beangulp examples suggest having an import.py script for extracting transactions, that uses your importer classes (either defined in the local directory or in a package installed from PyPI) and wraps them up inside beangulp.Ingest to allow you to extract transactions and add them to your ledger.

The Import Workflow

Here's a high-level overview of the import workflow:

  1. Run the import script: As described in the previous section, the first step is to run the import.py script directly (we'll describe what it looks like in the next section), giving it an operation (e.g. extract or identify) and the path to the file where your bank statement is stored.
  2. Extract transactions: Given the file, the matching importer extracts the transactions and outputs entries to be added to your ledger to standard output.
  3. Review and append: You review the extracted entries (typically Beancount directives) and append them to your Beancount ledger.
  4. Balance: You go through the newly added entries and balance the transactions that are not balanced already. Balancing is a separate topic but we'll cover it in another blog post!

This process is typically repeated whenever you have new bank statements to import.

Defining an Importer

An importer is nothing more than a Python class that inherits from beangulp.Importer and implements a few key methods. Here's an example:

from decimal import Decimal
import csv

from beancount.core import data, amount
from beangulp.importer import Importer
from dateutil import parser

class MyBankImporter(Importer):
    def __init__(self, account_name: str):
        self.account_name = account_name
    
    def identify(self, filepath: str):
        # Return True if this importer should process this file
        return filepath.endswith(".csv") and "mybank" in filepath.lower()
    
    def account(self, filepath: str):
        return self.account_name

    def extract(self, filepath: str):
        # Parse the file and return a list of beancount directives
        # (mainly transactions that contain one or more postings)
        entries = []

        with open(filepath) as f:
            reader = csv.DictReader(f)
            for index, row in enumerate(reader):
                posting = data.Posting(
                    account=self.account(filepath),
                    units=amount.Amount(row["Amount"], "EUR"),
                    cost=None,
                    price=None,
                    flag=None,
                    meta=None
                )
                entry = data.Transaction(
                    new_metadata(filepath, index),
                    parser.parse(row['Date']).date(),
                    flags.FLAG_OKAY,
                    row['Payee'],
                    row['Description'],
                    data.EMPTY_SET,
                    data.EMPTY_SET,
                    [posting],
                )

                entries.append(entry)
        return entries

The three methods that an importer class should define are:

Before you define such an importer, it may be worth searching on the PyPI or on Awesome Beancount if someone else has already written an importer for your bank and made it available on one of those two resources. If that is the case, you can directly pip-install the package and you're ready to go. If not, you'll need to write an importer yourself.

Creating the import.py Script

Now that the importer class is in place, create an import.py file at the root of your finances directory. This is the entry point that beangulp uses:

from importers.mybank import CheckingImporter, CreditCardImporter

from beangulp import Ingest

IMPORTERS = [
    CheckingImporter("Assets:Checking"),
    CreditCardImporter("Liabilities:CreditCard"),
]

if __name__ == "__main__":
    ingest = Ingest(IMPORTERS)
    ingest()

The IMPORTERS list contains all your importers. Each importer is initialized with its corresponding account name.

Running an Import

Now that the import.py script is in place, we can invoke it directly to convert the transactions in your bank statement to a format that Beancount understands.

# Extract transactions and review them
python import.py extract import.py /path/to/statement.csv

The output from extract should show all the Beancount directives that the extract method definition (inside the Importer class) returns, in the plaintext format meant to be appended to your Beancount ledger. If you're happy with the results, you can pipe the extract output to your .beancount file and move on to the balancing step.

Common Import Formats

Now that we know what the import workflow looks like, let's have a look at the common formats that your bank may make available for you to download your financial transactions.

CSV Files

CSV is the easiest format to import. Most banks (at least the good ones) offer CSV exports. Parse them with Python's csv module and map columns to Beancount fields.

OFX / QFX Files

OFX (Open Financial Exchange) is a standard bank format. At the time of this writing, the ofxparse library is available on PyPI that can be used to parse such files.

PDF Statements

PDF is generally harder to parse. For simple, structured PDFs, you can use the pdfplumber package. Alternatively, you could also look into first converting PDF files into plain text (e.g. using pdftotext) and then parsing that plain text output.

For complex PDFs, you may need to manually extract data or use OCR. In those cases, it's often easier to export as CSV from your bank instead.

Conclusion

You now have everything you need to import financial data into your Beancount ledger. The key takeaway is this: Beancount importers are just Python classes that extract transactions from your bank statements and output them in Beancount's format.

If you have an Importer class, you can plug it in to your import.py script and you should be off to the races. If your bank doesn't already have an open-source importer on PyPI, consider publishing yours. You'll help others in the Beancount community save time and reduce friction in their own personal accounting workflows.

The next step is to explore Awesome Beancount for pre-built importers for your bank, or start writing your own. Happy importing!