I've seen fairly often the complaint that Haskell is missing practical examples for new developers to embark on. I can relate to that, and because I'm up to the point where I can build "real world" applications I will try to share my experience with Haskell as I progress along.

Why exactly this example as a first post?

As you might guess from this blog, all the articles are static HTML files; following the trend of static website generators, I to have hacked togheter my own. And the first thing I've done in that regard was to convert Markdown posts to HTML.

There's a library for that

Under a quick web search we will hit instantly upon Pandoc, which is both a Haskell library and an independent command line application. We won't go with the application for obvious reasons, and we will install the library with cabal install pandoc. [10 minutes later...]

From a quick glance on the hackage documentation of Pandoc we can infer that there are two different submodules that we will use, Text.Pandic.Readers.Markdown and Text.Pandoc.Writers.HTML.

Reading markdown files

First of we'll write code that reads the markdown content into an intermediary represantion of Pandoc, from which we'll be able to later on generate HTML markup.

-- readM.hs is the file named
-- our executables always reside under the Main module
module Main where

  -- Function to get command line arguments
  -- type: IO [String]
  import System.Environment (getArgs) 

  -- Function that given a String will return a Pandoc document
  -- which later on we can convert to HTML. But according to
  -- its documentation the String is the second argument, and
  -- the first argument is some kind of ParserState
  -- type: ParserState -> String -> Pandoc
  import Text.Pandoc.Readers.Markdown (readMarkdown)

  -- Most parsing libraries provide some convenient defaults
  -- for us; less fiddling around and directly import the
  -- defaultParserState
  -- type: ParserState
  import Text.Pandoc.Parsing (defaultParserState)


  main :: IO ()
  main = do
    -- at the moment we care only for the first element of the list
    -- which we'll name file, and not f or any other short letter variant
    (file:_) <- getArgs 

    -- standard Prelude readFile function will return the content of the given file
    file_content <- readFile file

    -- with the content of the file available we can get out intermediary Pandoc document
    let pandoc_document = readMarkdown defaultParserState file_content

    -- display it's guts in the terminal
    putStrLn (show pandoc_document)

Next we will run this code on Markdown used by Pandoc itself on it's github repository README file.

$ wget https://raw.github.com/jgm/pandoc/master/README > pandoc.markdown
$ ghc readM.hs
$ ./readM pandoc.markdown
[big output cut out]

Intermediary refactoring #1

There is a useful utility that we'll want to use as often as possible, named hlint (cabal install hlint), a Haskell code linter with nice refactoring suggestions.

Note that hlint will get installed in ~/.cabal/bin, which I'd suggest you add to your environmental $PATH variable, or refer to it by it's full path.

$ hlint readM.hs
readM.hs:35:5: Error: Use print
Found:
  putStrLn (show pandoc_document)
Why not:
  print pandoc_document

1 suggestion

That makes perfect sense if we inspect the types in ghci.

Prelude> :t putStrLn . show
putStrLn . show :: Show a => a -> IO ()
Prelude> :t print
print :: Show a => a -> IO ()

The Show class constraint makes it clear that the received a will be passed to show prior to printing.

Assume this refactoring was applied and continue onwards.

Writing HTML files

Removed previous comments and updated the code in order to write (and generate) HTML content.

module Main where

  import System.Environment (getArgs) 

  import Text.Pandoc.Readers.Markdown (readMarkdown)
  import Text.Pandoc.Parsing (defaultParserState)

  -- the function that will return the HTML string from our
  -- intermediary Pandoc data. As with the parser/reader
  -- the first parameter is a configuration option for the
  -- writer.
  -- type: WriterOptions -> Pandoc -> String
  import Text.Pandoc.Writers.HTML (writeHtmlString)
  -- Assume default exist for anything until proven otherwise.
  -- type: WriterOptions
  import Text.Pandoc.Shared (defaultWriterOptions)


  main :: IO ()
  main = do
    -- this time we care for both the source file and the destination file
    (markdown_file:html_file:_) <- getArgs
    markdown_file_content <- readFile markdown_file
    let pandoc_document = readMarkdown defaultParserState markdown_file_content
    -- strangely even if the function has a write in it, it doesn't actually
    -- write anything; htmlString would have been more appropriate in my opinion
    let html_file_content = writeHtmlString defaultWriterOptions pandoc_document
    -- standard Prelude writeFile function
    -- type: FilePath -> String -> IO ()
    writeFile html_file html_file_content

A note about the FilePath type, most often it is a simple alias to String as it is the case with Prelude functions; but don't take that as granted and always look up its definiton from the package you are about to use.

Let us rename our program and test it on the same data.

$ mv readM.hs readMwriteH.hs
$ ghc readMwriteH.hs
$ ./readMwriteH pandoc.markdown pandoc.html

Voila a Markdown to HTML convertor under 5 lines of code (excluding comments, imports and main do).

Intermediary refactoring #2

Let's whip out hlint again and run it over our latest code.

$ hlint readMwriteH.hs
No suggestions

That just shows how awesome our code is, unless...

We have to deal with input validation

Because we coded the optimistic approach first, our code won't fail gracefully without the required arguments:

$ ./readMwriteH
readMwriteH: user error (Pattern match failure in do expression at readMwriteH.hs:22:5-31)

Let us then add some sane validation checks like an application of this kind would have. But first, there are two other packages you must install to go along with the following code, directory and split. Proceed as with previous package installation, and continue when they're done.

module Main where

  import System.Environment (getArgs)

  import Text.Pandoc.Readers.Markdown (readMarkdown)
  import Text.Pandoc.Parsing (defaultParserState)
  import Text.Pandoc.Writers.HTML (writeHtmlString)
  import Text.Pandoc.Shared (defaultWriterOptions)

  -- Spot on the name this time, split a list by a given sequence of elements
  -- type: Eq a => [a] -> [a] -> [[a]]
  import Data.List.Split (splitOn)

  -- You may know this function in other languages under the name `join`
  -- type: [a] -> [[a]] -> [a]
  import Data.List (intercalate)

  -- A function that you'd be tempted to search under the name `isFile`
  import System.Directory ( doesFileExist
      -- In order to check if a file is readable or writable we'll have to relly on
      -- the system permission attributes. The functions with theirs types, as to
      -- get the intuition how they will be used
      -- getPermissions :: FilePath -> IO Permissions
      -- readable :: Permissions -> Bool
      -- writable :: Permissions -> Bool
                          , getPermissions
                          , readable
                          , writable )


  -- We don't have to alias the FilePath type because it is provided via Prelude
  isReadable :: FilePath -> IO Bool
  isReadable file = do permissions <- getPermissions file
                       return (readable permissions)


  isWritable :: FilePath -> IO Bool
  isWritable file = do
    file_exists <- doesFileExist file
    if file_exists
      then do permissions <- getPermissions file
              return (writable permissions)
      -- split the filepath into its distinctive parts, via the  / separator
      else do let file_path_split = splitOn "/" file
              -- if the total parts of the path equal 1 it means that the file referer is in the
              -- local directory, and as such we return the current directory "." as next for the permission
              -- check. Otherwise we drop the last part of the filepath (init - returns all but the last element of a list)
              -- combine the parts togheter with the /
              let file_directory  = if (length file_path_split == 1) then "." else (intercalate "/" (init file_path_split))
              -- if the destination file exists *file_directory* will point to it, otherwise to the
              -- directory where the file resides
              permissions <- getPermissions file_directory
              return (writable permissions)


  main :: IO ()
  main = do
    -- defer arguments destructuring until we validate the correct form
    arguments <- getArgs
    -- I generally write the if with the false case first so that my code doesn't
    -- turn shape into a boomerang. Switch the conditions to see for yourself how awefull that looks
    if (length arguments /= 2)
      then putStrLn "Two arguments expected: source markdown file, and destination html file"
      else do
        -- destructuring arguments here with 100% certainty that the list has only 2 elements
        let (markdown_file:html_file:[]) = arguments
        existence_confirmed <- doesFileExist markdown_file
        if (not existence_confirmed)
          then putStrLn "The source markdown file doesn't seem to exist"
          else do
            is_readable <- isReadable markdown_file
            if (not is_readable)
              then putStrLn "The source markdown file is not readable"
              else do
                is_writable <- isWritable html_file
                if (not is_writable)
                  then putStrLn "Destination HTML file, or destination directory is not writable"
                  else do
                    markdown_file_content <- readFile markdown_file
                    let pandoc_document = readMarkdown defaultParserState markdown_file_content 
                    let html_file_content = writeHtmlString defaultWriterOptions pandoc_document
                    writeFile html_file html_file_content

At this point you may consider the code horrific, but that is something we will focus on next; so don't close this page just now.

Intermediary refactoring #3

I don't want to scare you with a big dump of output, but I've been warned that all my if's have redundant brackets on the expressions tested.

Major? refactoring

We can do better than that, at least we can move similar code in a single parametererized function; like the file validation functions. With that in mind have a look at the following result of refactoring.

module Main where

  import System.Environment (getArgs)

  import Text.Pandoc.Readers.Markdown (readMarkdown)
  import Text.Pandoc.Parsing (defaultParserState)
  import Text.Pandoc.Writers.HTML (writeHtmlString)
  import Text.Pandoc.Shared (defaultWriterOptions)
  import Data.List.Split (splitOn)
  import Data.List (intercalate)
  import System.Directory (doesFileExist, getPermissions, readable, writable)


  -- Encode both types of files as datatypes in our application
  data FileType = Source | Destination


  isValid :: FileType -> FilePath -> IO Bool
  -- Data type based validation. For source files moved the file existence check here as well
  isValid Source file = do file_exists <- doesFileExist file
                           -- Case expressions almost always look better than ifs
                           case file_exists of False -> return False
                                               True  -> do permissions <- getPermissions file
                                                           return (readable permissions)

  -- Same logic as before for the destination file, but at least encoded under a common validation
  -- function, instead of the more generic isReadable/isWritable
  isValid Destination file = do
    file_exists <- doesFileExist file
    permissions <- getPermissions (item_to_check file_exists file)
    return (writable permissions)
    -- Why abuse let in monadic code when we can just as well define functions outside of
    -- the block? Cleaner this way and a better reading flow, IMO
    where item_to_check exists file | exists    = file
                                    | otherwise = let file_path_split = splitOn "/" file
                                                  in if length file_path_split == 1 then "."
                                                        else (intercalate "/" (init file_path_split))


  -- For these two functions I'll let you complete the type in.
  -- Hint: use ghci to load the file and find out the type signatures associated
  readSource file = readFile file >>= (return . readMarkdown defaultParserState)
  writeDestination file source = writeFile file (writeHtmlString defaultWriterOptions source)


  main :: IO ()
  main = do
    arguments <- getArgs
    if length arguments /= 2
      then putStrLn "Two arguments expected: source markdown file, and destination html file"
      else do
        let (source_file:destination_file:[]) = arguments
        -- now the entire validation is enclosed in a single function, instead of scattered around
        -- in this main
        is_valid_source <- isValid Source source_file
        if not is_valid_source
          then putStrLn "The source markdown file doesn't seem to exist, or it is unreadable"
          else do
            is_valid_destination <- isValid Destination destination_file
            if not is_valid_destination
              then putStrLn "Destination HTML file, or destination directory is not writable"
              else do source <- readSource source_file
                      writeDestination destination_file source

Not a radical revamp, but considerably better than the last version we had; at least from a design point of view.

Intermediary refactoring #4

And a final suggestion from hlint to wrap it up.

$ hlint readMwriteH.hs
Found:
  case file_exists of
      False -> return False
      True -> do permissions <- getPermissions file
                 return (readable permissions)
Why not:
  if file_exists then
    (do permissions <- getPermissions file
        return (readable permissions))
    else return False

readMwriteH.hs:36:54: Warning: Redundant bracket
Found:
  if length file_path_split == 1 then "." else
    (intercalate "/" (init file_path_split))
Why not:
  if length file_path_split == 1 then "." else
    intercalate "/" (init file_path_split)

readMwriteH.hs:42:21: Warning: Use liftM
Found:
  readFile file >>= (return . readMarkdown defaultParserState)
Why not:
  Control.Monad.liftM (readMarkdown defaultParserState)
    (readFile file)

3 suggestions

With a touch of personal prefference

Everybody has its little style of writing code. While its mostly irrelevant now that our example program is done; I'd still like to share my personal touch to the code :)

module Main where

  import System.Environment (getArgs)
  import Text.Pandoc.Readers.Markdown (readMarkdown)
  import Text.Pandoc.Parsing (defaultParserState)
  import Text.Pandoc.Writers.HTML (writeHtmlString)
  import Text.Pandoc.Shared (defaultWriterOptions)
  import Data.List.Split (splitOn)
  import Data.List (intercalate)
  import System.Directory (doesFileExist, getPermissions, readable, writable)
  import Control.Monad (liftM)

  data FileType = Source | Destination

  isValid :: FileType -> FilePath -> IO Bool
  isValid Source file = doesFileExist file >>= (\e -> if e then liftM readable $ getPermissions file
                                                           else return False)

  isValid Destination file = liftM writable $ doesFileExist file >>= (getPermissions . item file)
    where item file True  = file
          item file False = let s = splitOn "/" file
                            in if length s == 1 then "." else intercalate "/" $ init s

  readSource file = liftM (readMarkdown defaultParserState) $ readFile file

  writeDestination file source = writeFile file $ writeHtmlString defaultWriterOptions source

  main :: IO ()
  main = do
    arguments <- getArgs
    if length arguments == 2 then do
        let (sf:df:[]) = arguments
        is_valid_source <- isValid Source sf
        if is_valid_source then do
            is_valid_destination <- isValid Destination df
            if is_valid_destination then readSource sf >>= writeDestination df

              else putStrLn "Destination HTML file, or destination directory is not writable"
          else putStrLn "The source markdown file doesn't seem to exist, or it is unreadable"
      else putStrLn "Two arguments expected: source markdown file, and destination html file"

A request for comments

This has been a rather fun article to write, and I'd like to hear from others if it has been helpful; or comprehensive enough to not leave visible gaps between implementations presented.

If you'd like to see other posts of this type and got a particular small application (or functionality) you'd like to see implemented in this step by step form, let me know. It could be beneficial for both of us. I'd dip my toes even more in Haskell if I'd got other things to hack upon except a blog generator and some secret project.

Haskell web frameworks are out of the question at the moment though.


You are reading the blog of ; web developer, terminal hacker and functional programming apprentice.