Nanoc, Pandoc, and Pygments ― Oh my!

February 5, 2015

While trying to add syntax highlighting to my blog, I felt I needed to add syntax highlighting support, but I was at a loss of where to begin. As of writing, I’m using nanoc to compile my site, but my posts are written in pandoc’s markdown.

For my initial iteration, I simply enabled syntax highlighting in pandoc. Pandoc uses the highlighting-kate package. Both packages are developed by the extraordinary John MacFarlane.

Unfortunately, while highlighting-kate has support for a large number of languages (112 as of writing), I find pygments still outperforms other highlighters, both in number of supported languages (over 300), but also in the richness and accuracy of the syntax definitions.

It turns out nanoc has support for pygments through it’s :colorize_syntax module.

Unfortunately, when pandoc generates code blocks, it nests a <code> element inside a <pre> tag, and places a CSS class with the language’s name on the <pre> element. :colorize-syntax however requires the css class be on the <code> element, and that it be prefixed with language-1.

To work around this, I’ve devised a simple nanoc filter that uses nokogiri to transform pandoc’s output into one supported by nanoc.

# You may use this snippet under the WTFPL <http://www.wtfpl.net/>
#
# Converts pandoc-generated HTML to the format the :colorize_syntax understands,
# with the goal of using pygments with pandoc output.
#
# There are two issues:
# - Pandoc puts the classes on the <pre> tag, but nanoc wants them on <code>
# - Pandoc doesn't use a language- prefix for classes that nanoc wants

class PandocToColorize < Nanoc::Filter
    require "nokogiri"
    require "pygments"

    identifier :pandoc_to_colorize
    type :text

    def run(content, params={})
        doc = Nokogiri::HTML.parse content
        doc.css("pre > code").each do |element|
            next unless element.parent["class"]

            element["class"] ||= ""
            element.parent["class"].split(/\s+/).each do |cl|
                if Pygments::Lexer.find(cl) != nil
                    element["class"] <<= " language-#{cl}"
                else
                    element["class"] <<= " #{cl}"
                end
            end
            element.parent.delete "class"
            element["class"] = element["class"].strip
        end
        doc.to_html
    end
end

I apologize if my Ruby isn’t idiomatic. I’m unfamiliar with the language.

You can then simply drop the filter into lib/pandoc_to_colorize.rb, and add filter :pandoc_to_colorize to your Rules file.

Update (November 2016)

It looks like pygments.rb isn’t actively maintained anymore. However, rouge is, and it’s pretty easy to convert this filter to one that works with Rouge. Simply replace require "pygments" with require "rouge" and Pygments::Lexer with Rouge::Lexer.


  1. This is probably for the best anyways, as it prevents naming conflicts.↩︎

The views expressed on this site are my own and do not reflect those of my employer.