Elisp Posts

You want to write a custom org backend? Let's write onlybold backend together to get you started

2022-02-20
/Tony Aldon/
comment on reddit
/
org-mode revision: 96d91bea658c

Hi Emacsers,

Recently I've been playing with org-element and org-export.

Specifically, I was interested in the mechanism of the org exporter system and its flexibility.

The goal of this post is to get you started with the creation of org backends.

To do so, we build an org backend that:

  1. keeps only bold elements,

  2. surrounds bold elements with *** before and after,

  3. surrounds paragraph elements with :: before and after,

  4. surrounds section elements with <-- before and --> after (removing the last newline).

We call it onlybold.

Before we start, if you are interested, I recommend you to read in org-mode's source code the following files:

You can get org-mode's source code by running the following command:

git clone https://git.savannah.gnu.org/git/emacs/org-mode.git

Let's get started.

what we want to achieve

We want to export this org buffer:

I like bold-1 and bold-2 and you?
I don't.  I prefer bold-3.


I've loved bold-4 since I was a child.

I'm italic.

into another buffer like this:

<--::***bold-1*** ***bold-2*** ***bold-3***::


::***bold-4*** ::-->

org export mechanism

When org exports an org buffer, basically it does two things:

  1. parse the org buffer producing a tree (a nested elisp list) representing the org buffer and,

  2. recursively build a string by traversing the tree and choosing for each node what to do with it by looking up its associated transcode function defined by the org backend.

This means that org does the hard work for us "parsing" and "traversing".

To build our onlybold org backend and any other org backends, in the simplest case, we just have to provide the transcode functions (or simply transcoders).

transcoders, org-export-define-backend and org-export-to-buffer

The function org-export-define-backend takes as arguments:

  1. the backend's name we want to define and

  2. an alist of transcoders.

A transcoder (or a transcode function), is a function that handles an org element when it is being exported.

For instance, our backend onlybold must define a transcoder for bold elements that surrounds bold texts with 3 stars *** like this:

bold text -> ***bold text***

Most transcoders take three arguments:

  1. the element as it appears in the parsed tree,

  2. a content strings corresponding to the children of the element already "transcoded",

  3. the communication channel that contains all the information the export system needs to correctly export the document (the obvious ones are the title, date and author of the document that can be defined inside the document using lines starting by #+TITLE:, #+DATE: or #+AUTHOR).

Let's define onlybold-bold, the transcoder of bold elements:

(defun onlybold-bold (bold contents info)
  (concat "***" contents "***"))

Now, we can define the first version of onlybold backend, which transcodes only bold elements:

(org-export-define-backend 'onlybold
  '((bold . onlybold-bold)))

Then we defined the command onlybold-export that pops up the buffer *onlybold* which contains the exported content (using onlybold backend) of the current buffer:

(defun onlybold-export ()
  (interactive)
  (org-export-to-buffer 'onlybold "*onlybold*"))

Now, if we call the command onlybold-export inside our org buffer, the buffer *onlybold* pops up with nothing in it.

We might be disappointed, but we aren't. This is totally normal.

In a specific backend, when an element doesn't have a transcoder to handle it, the element is skipped. (In the same vein, if a transcoder return nil for an element, the element is also skipped).

parsed tree, section elements and paragraph elements

In our org buffer, the bold elements belong to paragraphs that belong to a section. We can see this by looking at the parsed tree in the buffer *Pp Eval Output* after running the following command (being in the org buffer):

M-x pp-eval-expression RET (org-element-parse-buffer)

We get the following tree ( ... represents information that are not related to the shape of the tree):

(org-data
 nil
 (section
  (...)
  (paragraph
   (...)
   #("I like " ...)
   (bold
    (...)
    #("bold-1" ...))
   #("and " ...)
   (bold
    (...)
    #("bold-2" ...))
   #("and you?\nI don't.  I prefer " ...)
   (bold
    (...)
    #("bold-3" ...))
   #(".\n" ...))
  (paragraph
   (...)
   #("I've loved " ...)
   (bold
    (...)
    #("bold-4" ...))
   #("since I was a child.\n" ...))
  (paragraph
   (...)
   #("I'm " ...)
   (italic
    (...)
    #("italic" ...))
   #("." ...))))

Indeed, bold elements belong to paragraph elements that belong to a section element.

And as we have just seen, if a backend doesn't provide a transcoder for an element, this element will be ignored in the exported result.

So let's write onlybold-section, the transcoder of section elements which surrounds their content with <-- and -->:

(defun onlybold-section (section contents info)
  (concat "<--" contents "-->"))

and onlybold-paragraph, the transcoder of paragraph elements which surrounds their content with :::

(defun onlybold-paragraph (paragraph contents info)
  (concat "::" contents "::"))

Then, we modify onlybold backend like this:

(org-export-define-backend 'onlybold
  '((bold . onlybold-bold)
    (section . onlybold-section)
    (paragraph . onlybold-paragraph)))

Now, if we call the command onlybold-export inside our org buffer, the buffer *onlybold* pops up with this content:

<--::I like ***bold-1*** and ***bold-2*** and you?
I don't.  I prefer ***bold-3***.
::


::I've loved ***bold-4*** since I was a child.
::

::I'm .
::
-->

This is better:

  1. The bold elements has been transcoded as we expected,

  2. The "normal" text remains the same as in our org buffer and,

  3. note that the italic element has been ignored (which was expected because we didn't provide a transcoder for italic elements).

only keep bold elements

plain-text elements are the leaves of the parsed tree and they are strings. This is the right level to operate in order to keep only bold elements.

So now, let's handle the plain-text elements and keep only bold elements.

There is at least two ways to do it:

  1. using the filter system provided by the org export system (and so provide a filter that applies to plain-text elements) or,

  2. providing a specific transcoder for plain-text elements.

We implement the latter.

Let's write the transcoder onlybold-plain-text which checks if the parent of the plain-text element (the string) is a bold element. If this is the case, we return the string and if not we return nil:

(defun onlybold-plain-text (text info)
  (when (eq 'bold (org-element-type (org-element-property :parent text)))
    text))

Note that the arity (number of arguments) of onlybold-plain-text is different from the transcoders that we've seen so far.

Then we add it to onlybold backend:

(org-export-define-backend 'onlybold
  '((bold . onlybold-bold)
    (section . onlybold-section)
    (paragraph . onlybold-paragraph)
    (plain-text . onlybold-plain-text)))

Now, if we call the command onlybold-export inside our org buffer, the buffer *onlybold* pops up with this content:

<--::***bold-1*** ***bold-2*** ***bold-3***::


::***bold-4*** ::

::::
-->

We have filtered the text to keep only bold elements.

remove empty paragraphs and the last newline of the section

Let's go further and remove the last empty paragraph.

To do so, we can "ask" the transcoder onlybold-paragraph to return nil when its contents is "empty", specifically when its content is the empty strings "" or a newline "\n". Here is the new implementation:

(defun onlybold-paragraph (paragraph contents info)
  (if (member contents '("" "\n"))
      nil
    (concat "::" contents "::")))

Now, if we call the command onlybold-export inside our org buffer, the buffer *onlybold* pops up with this content:

<--::***bold-1*** ***bold-2*** ***bold-3***::


::***bold-4*** ::
-->

We are almost happy :)

Only one thing remains...

The end of the section --> alone in the last line is "quite ugly".

Let's put it just after :: that close the last paragraph.

We can do this by modifying onlybold-section and "asking" it to remove the last newline of its content which is matched by the regexp "\n\\'":

(defun onlybold-section (section contents info)
  (let ((cts (replace-regexp-in-string "\n\\'" "" contents)))
    (concat "<--" cts "-->")))

Now, if we call the command onlybold-export inside our org buffer, the buffer *onlybold* pops up with this content:

<--::***bold-1*** ***bold-2*** ***bold-3***::


::***bold-4*** ::-->

We are done ;)

I hope that this toy example helps you get started with the creation of org backends

acknowledgments

I want to take the opportunity of this post to thank:

  1. Nicolas Goaziou who is the author and maintainer of org-export-define-backend, and org-element-at-point.

  2. All the people who work and contribute to org-mode (built-in and external packages),

  3. All the people who work and contribute to Emacs (built-in and external packages).

And I want to tell you that:

Each time a piece of your code is heavy, I know that:

  1. this piece of code fixes a bug or,

  2. this piece of code handles an edge case or,

  3. this piece of code provides flexibility (via options) to the end user.

Each time your code is simple, I know that you worked hard to make it simple.

And the more important, each time I read a piece of your code I feel closer to you.

Emacs is pure joy and it is thanks to you.