Writing a Pandoc filter to convert Org to Things

Posted: Tue 16 August 2022
Filed under Haskell
Tags: haskell pandoc emacs

I have of late begun using Things as a planner app on my mobile devices, to supplement the usage of Orgmode on my computers. Ideally I would have liked to use an Orgmode based app on my mobile devices, and beorg seemed like a fairly good choice. However, I wanted the iOS/iPadOS app to have a few extra qualities.

The organizational hierarchy should be a strict subset of Orgmode's hierarchy. Orgmode is almost too flexible, and the excessive flexibility means that my main.org file is very disorganized. The constraints in the app should hopefully lead to a more maintainable system.
The app should be well maintained (i.e. not abandonware), with an iPad version that supports keyboard shortcuts.
The app also should have an API for importing data, as well as exporting it.

Beorg does fairly well on the third front, since it can work directly with .org files, and supports various cloud backends, but the fact that it supports most of Orgmode's hierarchy meant that it wouldn't really help with my goal of making my Orgmode agenda less chaotic. And the iPad version of beorg is also not too great, which pretty much ruled it out.

The first requirement really narrowed down the options, and I decided to go with Things. It fares quite well on the first and second requirements: it only supports hierarchical trees of depth at most 3, and it groups Todos in Projects, which are grouped by Area, which is simple enough to maintain, but flexible enough to work for most situtations. The actual apps are also very well made, with a clean and clutter free interface, and no in-app purchases or subscriptions.

The only requirement that Things lacks in is the third one. One way to import Projects and Todos into Things are specially crafted URLs, but those need to be generated, and then manually clicked, and exporting data involves exporting and SQLite database file, and parsing it, which is not very ergonomic. Since I mostly plan to import Org agenda items to Things, and not the other way around, I decided just implement an Org-to-Things pipeline, and worry about a Things-to-Org pipeline later if it's necessary.

My Org mode export setup

Prior to when I started using Things, I would visit a webpage I autogenerated from my Org file using Pandoc. Pandoc does an excellent job parsing Org, and turning it HTML that retains all the information: tags look visually different, TODOs are prominent, the hierarchical structure is preserved, and a table of contents is generated as well. The plan was to write some code that hooked into the Org-to-HTML pipeline, and insert the custom Things links at the end of each Todo and Project. Pandoc makes doing that very easy through Pandoc filters. Pandoc filters are programs that transform the abstract syntax tree (AST) that Pandoc internally uses to represent documents, and these filters are easily applied to any Pandoc conversion.

However, Pandoc's native AST does not really encode the hierarchy of an Org file in a tree-like form, but rather as [Block], i.e. a list of blocks, which coarsely correspond to a line of the Org file, but with punctuations and tags parsed. It does make sense why the AST does not really capture the Org syntax tree, since the Pandoc AST is quite general, and needs to be able to read and write to several different formats. But that did mean that I would have to write a parser to parse [Block], turn it into a "real" AST, then insert the Things links, and then output the new [Block].

Writing the parser

Rather than using an existing parser combinator library like Megaparsec, I decided it would be more fun to implement monadic parsing from scratch, following Monadic parsing in Haskell. The parser itself turned out to be fairly simple to write: it parsed the Blocks, kept track of whether it descended into an 'Area' (which is a collection of 'Project's) or a 'Project' (which is a collection of 'Todo's). As soon as the parser parses a 'Todo' (which is the leaf node), it outputs the Blocks it read, with an additional Block appended, which contains the specially crafted link.

The parsing is really a single pass compilation process: it's a wrapper around a StateT where the state is [Block].

type BlockParser e = StateT [Block] (Either e) [Block]

One way to possibly improve the single-pass compiler/parser would have been to incorporate the second [Block] argument in the state, like the following definition.

type BlockParser e a = StateT ([Block], [Block]) Either e a

Here, the first element of the state tuple represents the Blocks left to read, and the second argument represents the parsed and modified Blocks. While this is conceptually a cleaner way of writing the parser, in practice it led to slightly longer code, but if I do something like this in the future, I will probably go with this version.

Once I had the single-pass compiler/parser, turning it into a Pandoc filter was fairly straightforward, following pandoc-types documentation. The real upshot of this effort is that I now have a parser for the subset of Org I use, that I can reuse for other projects, like a web-frontend for agenda system.

The final parser and filter are now hosted on Github.