Published
Importing external data into Beancount is often a source of confusion for folks new to Beancount. In this article we'll give you a high-level overview of how you can import external data (e.g. your bank statements including PDF documents or CSV exports) into your Beancount ledger. Our aim is to get you up and running as quickly as possible.
Beancount itself doesn't directly parse CSV files or bank statements. Instead, you write importers, which are Python classes that:
In Beancount 3, the maintainers decided to leave the decision of how to import data
up to the user, which means there are multiple ways of setting it up. That being said,
the beangulp examples suggest having an import.py script for extracting
transactions, that uses your importer classes (either defined in the local directory or
in a package installed from PyPI) and wraps them up inside beangulp.Ingest to allow
you to extract transactions and add them to your ledger.
Here's a high-level overview of the import workflow:
import.py script directly (we'll describe what it looks like in the next
section), giving it an operation (e.g. extract or identify) and the path to the
file where your bank statement is stored.This process is typically repeated whenever you have new bank statements to import.
An importer is nothing more than a Python class that inherits from beangulp.Importer
and implements a few key methods. Here's an example:
from decimal import Decimal
import csv
from beancount.core import data, amount
from beangulp.importer import Importer
from dateutil import parser
class MyBankImporter(Importer):
def __init__(self, account_name: str):
self.account_name = account_name
def identify(self, filepath: str):
# Return True if this importer should process this file
return filepath.endswith(".csv") and "mybank" in filepath.lower()
def account(self, filepath: str):
return self.account_name
def extract(self, filepath: str):
# Parse the file and return a list of beancount directives
# (mainly transactions that contain one or more postings)
entries = []
with open(filepath) as f:
reader = csv.DictReader(f)
for index, row in enumerate(reader):
posting = data.Posting(
account=self.account(filepath),
units=amount.Amount(row["Amount"], "EUR"),
cost=None,
price=None,
flag=None,
meta=None
)
entry = data.Transaction(
new_metadata(filepath, index),
parser.parse(row['Date']).date(),
flags.FLAG_OKAY,
row['Payee'],
row['Description'],
data.EMPTY_SET,
data.EMPTY_SET,
[posting],
)
entries.append(entry)
return entries
The three methods that an importer class should define are:
identify(filepath): Return True if this importer should handle this file.
Check the filename, headers, or other identifiers to match your bank's files.account(filepath): Return the account associated with the given fileextract(filepath): Read the file and return beancount.core.data.Entries, also
known as directives in BeancountBefore you define such an importer, it may be worth searching on the PyPI or on Awesome Beancount if someone else has already written an importer for your bank and made it available on one of those two resources. If that is the case, you can directly pip-install the package and you're ready to go. If not, you'll need to write an importer yourself.
Now that the importer class is in place, create an import.py file at the root of your
finances directory. This is the entry point that beangulp uses:
from importers.mybank import CheckingImporter, CreditCardImporter
from beangulp import Ingest
IMPORTERS = [
CheckingImporter("Assets:Checking"),
CreditCardImporter("Liabilities:CreditCard"),
]
if __name__ == "__main__":
ingest = Ingest(IMPORTERS)
ingest()
The IMPORTERS list contains all your importers. Each importer is initialized with its
corresponding account name.
Now that the import.py script is in place, we can invoke it directly to convert the
transactions in your bank statement to a format that Beancount understands.
# Extract transactions and review them
python import.py extract import.py /path/to/statement.csv
The output from extract should show all the Beancount directives that the extract
method definition (inside the Importer class) returns, in the plaintext format meant to
be appended to your Beancount ledger. If you're happy with the results, you can pipe the
extract output to your .beancount file and move on to the balancing step.
Now that we know what the import workflow looks like, let's have a look at the common formats that your bank may make available for you to download your financial transactions.
CSV is the easiest format to import. Most banks (at least the good ones) offer CSV
exports. Parse them with Python's csv module and map columns to Beancount fields.
OFX (Open Financial Exchange) is a standard bank format. At the time of this writing,
the ofxparse library is available on PyPI that can be used to parse such files.
PDF is generally harder to parse. For simple, structured PDFs, you can use the
pdfplumber package. Alternatively, you could also look into first converting PDF files
into plain text (e.g. using pdftotext) and then parsing that plain text output.
For complex PDFs, you may need to manually extract data or use OCR. In those cases, it's often easier to export as CSV from your bank instead.
You now have everything you need to import financial data into your Beancount ledger. The key takeaway is this: Beancount importers are just Python classes that extract transactions from your bank statements and output them in Beancount's format.
If you have an Importer class, you can plug it in to your import.py script and you
should be off to the races. If your bank doesn't already have an open-source importer on
PyPI, consider publishing yours. You'll help others in the Beancount community save time
and reduce friction in their own personal accounting workflows.
The next step is to explore Awesome Beancount for pre-built importers for your bank, or start writing your own. Happy importing!