How List Fusion Can Change Haskell Program Semantics

Posted on January 20, 2022

There is a page on the Haskell Wiki that is dedicated to notes on the correctness of short cut fusion. The page covers two types of short cut fusion, specifically foldr/build and destroy/unfoldr, both of which are implemented using GHC rewrite rules, and both of which are potentially unsafe in programs that use seq to force evaluation.

In this article, I will:

  1. Outline what GHC rewrite rules are;
  2. Cover the basics of the foldr/build rewrite rule; and
  3. Show an example where the foldr/build rewrite rule is not a correct program optimization, meaning that it actually changes the semantics of the program, causing a function that would ordinarily return (in the absence of the rewrite) to fail.

What are GHC Rewrite Rules?

GHC rewrite rules allow programmers to specify that certain function applications in the source code should be rewritten automatically by the simplifier at compile-time. These rules, combined with inlining, can significantly improve program performance. However, the compiler does not check that the rules preserve the semantics of the program, so the programmer must ensure that the rules are correct. In case the rules are incorrect, the program may act in unexpected ways.

What is the foldr/build rewrite rule?

The foldr/build rewrite rule rewrites foldr calls when the list that is being folded is actually produced by a call to build. To help understand how this rewrite rule works, let us first look at the types of the functions:

build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
foldr :: (a -> b -> b) -> b -> [a] -> b

As you can see, the first argument of build’s first argument has the same type as the first argument of foldr. Because foldr is essentially the inverse of build, instead of passing a [a] to foldr, we can just pass the first argument of foldr to the function that would ordinarily be passed as the first argument to build. Essentially, we are linking the two functions together directly, bypassing the intermediate list. This has the potential to significantly improve performance.

More concisely, the foldr/build rewrite rule says that foldr c n (build g) should always be replaced by g c n. (While it may be unlikely to have this specific starting function application in the program source code, it can easily be arrived at through inlining list-based code.) You can see the implementation of this rule in GHC.Base:

"fold/build"    forall k z (g::forall b. (a->b->b) -> b -> b) .
                foldr k z (build g) = g k z

The rewrite rule is not always correct.

As the wiki page explains, this rewrite rule unfortunately is not always correct. Sometimes, making this replacement actually changes the meaning (i.e., the semantics) of the program.

The program semantics may change as a result of this rewrite when the g function forces a value that would not have been forced in the original program.

For example, in the code below, we force the result of two head functions implemented in terms of foldr. In one case, the fold/build rewrite rule is applied, and in the other it is not. Both cases attempt to take the first element from a single-item list. If you compile the program with optimizations (try ghc -O2) and run it, the first (not rewritten) head function returns the value without an error, but the second one forces the error "empty list" value, resulting in abnormal program termination.

module Main where

-- The 'foldr` in the Prelude works with Foldables.
import Prelude hiding ( foldr )
-- We want the simple one from GHC.Base for [a]s.
import GHC.Base ( foldr, build )

import System.IO

main :: IO ()
main =
  -- Force the head function that is not rewritten: this will not raise an
  -- error.
  headNoRewrite `seq`
  -- Force the head function that is rewritten: this will raise an error.
  headRewrite `seq`
  -- return
  return ()

headRewrite :: Integer
headRewrite = foldr (\x acc -> x) (error "empty list") (build g)
-- rewritten to:
--   g (\x acc -> x) (error "empty list")

headNoRewrite :: Integer
headNoRewrite = foldr (\x acc -> x) (error "empty list") mkList

-- Prevent this function from being inlined. That way, the rewrite rule cannot
-- be applied.
{-# NOINLINE mkList #-}
mkList :: [Integer]
mkList = build g

-- Produce a sequence of one integer.
g :: (Integer -> b -> b) -> b -> b
g f x0 = x0 `seq` f 1 x0

This is a contrived example, but it shows that programmers need to be careful when mixing strictness with short cut fusion. The Haskell GitLab wiki has some notes about why list functions and rewrite rules are written the way they are. This information, combined with a look at the Core output from different passes of the compiler’s optimization pipeline, can help ensure that you get expected results even with the pitfalls of this rewrite rule.