Correcting the warnings of Pattern Matching in GHC 6

I've quited doing this. The motivations are in the bug report.

I'm Writing about the problem of Pattern Matching.

The warnings for pattern matching in GHC are not being correctly shown. There're a lot of bugs reported on this theme, and they're grouped in Ticket 595.

There's a solution for it, and the GHC developers suggested that this paper should be implemented in GHC. I've written a sketch of how the code should work:

Theoretical code

This is just the direct implementation of the theory in the referred article.

-- base
import Control.Arrow (first)
import Data.Maybe (mapMaybe, fromJust)
 
data C = C0 | C1 P | C2 P P
  deriving (Eq, Show) -- for debugging
 
data P = Wildcard | C C
  deriving (Eq, Show) -- for debugging
 
newtype V = V [P] deriving (Eq, Show) -- for debugging
 
newtype M = M [V] deriving (Eq, Show) -- for debugging
 
-- Get the constructors present on a list of patterns
catCs :: [P] -> [C]
catCs ps = [c | C c <- ps]
 
same :: C -> C -> Bool
same C0 C0 = True
same (C1 _p1) (C1 _p2) = True
same (C2 _p11 _p12) (C2 _p21 _p22) = True
same _c1 _c2 = False
 
uRec :: M -> V -> Bool
-- Nothing is matched, so the new vector is useful.
uRec (M []) _q = True
-- The vector is not useful, since it matches nothing.
uRec _p (V []) = False
uRec (M (V [] : _rest)) _q = False
uRec p @ (M ps) q @ (V (q1 : qs))
  = case q1 of
  Wildcard
    -> if isComplete σ then any specialized σ else uRec (d p) $ V qs
  -- In case the first column of the vector is a Constructor, we need to
  -- extract it from the vector and from the m.
  C c -> uRec (s c p) $ s c q
  where
    σ :: [C]
    σ = catCs $ map headV ps
 
    specialized :: C -> Bool
    specialized c = uRec (s c p) $ s c q
 
headV :: V -> P
headV (V v) = head v
 
class S a where
  s :: C -> a -> a
 
instance S M where
  s c (M m) = M $ mapMaybe (sV c) m
 
instance S V where
  s = (.)(.)(.) fromJust sV
 
sV :: C -> V -> Maybe V
sV _c (V []) = error "sV _c []"
sV c (V (Wildcard : ps)) = Just $ V $ map (const Wildcard) (extract c) ++ ps
sV c (V (C c_ : cs))
  | same c c_ = Just $ V $ extract c_ ++ cs
  | otherwise = Nothing
 
extract :: C -> [P]
extract C0 = []
extract (C1 p) = [p]
extract (C2 p1 p2) = [p1, p2]
 
isComplete :: [C] -> Bool
isComplete σ
  = all present complete
  where
    present :: C -> Bool
    present cons = any (same cons) σ
 
complete :: [C]
complete = [C0, C1 undefined, C2 undefined undefined]
 
d :: M -> M
d (M m) = M $ mapMaybe dV m
 
dV :: V -> Maybe V
dV (V (Wildcard : ps)) = Just $ V ps
dV _ = Nothing
 
exhaustive :: M -> Bool
exhaustive (M []) = False
exhaustive m @ (M (V v : _vs))
  = not $ uRec m $ V $ replicate (length v) Wildcard
 
useful :: M -> Bool
useful (M v) = all (uncurry uRec . first M) $ individuals v
 
individuals :: [a] -> [([a], a)]
individuals xs = map (remove xs) [0 .. pred $ length xs]
 
remove :: [a] -> Int -> ([a], a)
remove = remove_ []
 
remove_ :: [a] -> [a] -> Int -> ([a], a)
remove_ _ [] _ = error "remove_ _ [] _"
remove_ ys (x : _) 0 = (ys, x)
remove_ ys (x : xs) i = remove_ (ys ++ [x]) xs $ pred i

This is working for the tests I've made. Now it's necessary to adapt this code to the GHC data types.

GHC code

The function I have to make, I think, is:

check :: [EquationInfo] -> ([ExhaustivePat], [EquationInfo])
  -- Second result is the shadowed equations
  -- if there are view patterns, just give up - don't know what the function is

This function is in the deSugar/Check.lhs file in the compiler directory of the source code of GHC.

This function takes the equations of a pattern and returns:
\begin{itemize}
\item The patterns that are not recognized
\item The equations that are not overlapped
\end{itemize}

Data types

EquationInfo

From deSugar/DsMonad.hs:

data EquationInfo
  = EqnInfo { eqn_pats :: [Pat Id],    	-- The patterns for an eqn
	      eqn_rhs  :: MatchResult }	-- What to do after match
Pat

From hsSyn/HsPat.lhs:

data Pat id
  =	------------ Simple patterns ---------------
    WildPat	PostTcType		-- Wild card
	-- The sole reason for a type on a WildPat is to
	-- support hsPatType :: Pat Id -> Type
 
  | VarPat	id			-- Variable
  | VarPatOut	id (DictBinds id)	-- Used only for overloaded Ids; the 
					-- bindings give its overloaded instances
  | LazyPat	(LPat id)		-- Lazy pattern
  | AsPat	(Located id) (LPat id)  -- As pattern
  | ParPat      (LPat id)		-- Parenthesised pattern
  | BangPat	(LPat id)		-- Bang pattern
 
	------------ Lists, tuples, arrays ---------------
  | ListPat	[LPat id]		-- Syntactic list
		PostTcType		-- The type of the elements
 
  | TuplePat	[LPat id]		-- Tuple
		Boxity			-- UnitPat is TuplePat []
		PostTcType
	-- You might think that the PostTcType was redundant, but it's essential
	--	data T a where
	--	  T1 :: Int -> T Int
	--	f :: (T a, a) -> Int
	--	f (T1 x, z) = z
	-- When desugaring, we must generate
	--	f = /\a. \v::a.  case v of (t::T a, w::a) ->
	--			 case t of (T1 (x::Int)) -> 
	-- Note the (w::a), NOT (w::Int), because we have not yet
	-- refined 'a' to Int.  So we must know that the second component
	-- of the tuple is of type 'a' not Int.  See selectMatchVar
 
  | PArrPat	[LPat id]		-- Syntactic parallel array
		PostTcType		-- The type of the elements
 
	------------ Constructor patterns ---------------
  | ConPatIn	(Located id)
		(HsConPatDetails id)
 
  | ConPatOut {
	pat_con   :: Located DataCon,
	pat_tvs   :: [TyVar],		-- Existentially bound type variables (tyvars only)
	pat_dicts :: [id],		-- Ditto *coercion variables* and *dictionaries*
					-- One reason for putting coercion variable here, I think,
					-- 	is to ensure their kinds are zonked
	pat_binds :: DictBinds id,	-- Bindings involving those dictionaries
	pat_args  :: HsConPatDetails id,
	pat_ty	  :: Type   		-- The type of the pattern
    }
 
	------------ View patterns ---------------
  | ViewPat       (LHsExpr id)      
                  (LPat id)
                  PostTcType        -- The overall type of the pattern
                                    -- (= the argument type of the view function)
                                    -- for hsPatType.
 
	------------ Quasiquoted patterns ---------------
	-- See Note [Quasi-quote overview] in TcSplice
  | QuasiQuotePat   (HsQuasiQuote id)
 
	------------ Literal and n+k patterns ---------------
  | LitPat	    HsLit		-- Used for *non-overloaded* literal patterns:
					-- Int#, Char#, Int, Char, String, etc.
 
  | NPat	    (HsOverLit id)		-- ALWAYS positive
		    (Maybe (SyntaxExpr id))	-- Just (Name of 'negate') for negative
						-- patterns, Nothing otherwise
		    (SyntaxExpr id)		-- Equality checker, of type t->t->Bool
 
  | NPlusKPat	    (Located id)	-- n+k pattern
		    (HsOverLit id)	-- It'll always be an HsIntegral
		    (SyntaxExpr id)	-- (>=) function, of type t->t->Bool
		    (SyntaxExpr id)	-- Name of '-' (see RnEnv.lookupSyntaxName)
 
	------------ Generics ---------------
  | TypePat	    (LHsType id)	-- Type pattern for generic definitions
                                        -- e.g  f{| a+b |} = ...
                                        -- These show up only in class declarations,
                                        -- and should be a top-level pattern
 
	------------ Pattern type signatures ---------------
  | SigPatIn	    (LPat id)		-- Pattern with a type signature
		    (LHsType id)
 
  | SigPatOut	    (LPat id)		-- Pattern with a type signature
		    Type
 
	------------ Pattern coercions (translation only) ---------------
  | CoPat 	HsWrapper		-- If co::t1 -> t2, p::t2, 
					-- then (CoPat co p) :: t1
		(Pat id)		-- Why not LPat?  Ans: existing locn will do
	    	Type			-- Type of whole pattern, t1
	-- During desugaring a (CoPat co pat) turns into a cast with 'co' on 
	-- the scrutinee, followed by a match on 'pat'
Id

From basicTypes/Var.lhs:

type Id = Var
Every @Var@ has a @Unique@, to uniquify it and for fast comparison, a
@Type@, and an @IdInfo@ (non-essential info about it, e.g.,
strictness).  The essential info about different kinds of @Vars@ is
in its @VarDetails@.
-- | Essentially a typed 'Name', that may also contain some additional information
-- about the 'Var' and it's use sites.
data Var
  = TyVar {
	varName    :: !Name,
	realUnique :: FastInt,		-- Key for fast comparison
					-- Identical to the Unique in the name,
					-- cached here for speed
	varType       :: Kind,          -- ^ The type or kind of the 'Var' in question
        isCoercionVar :: Bool
 }
 
  | TcTyVar { 				-- Used only during type inference
					-- Used for kind variables during 
					-- inference, as well
	varName        :: !Name,
	realUnique     :: FastInt,
	varType        :: Kind,
	tcTyVarDetails :: TcTyVarDetails }
 
  | Id {
	varName    :: !Name,
	realUnique :: FastInt,
   	varType    :: Type,
	idScope    :: IdScope,
	id_details :: IdDetails,	-- Stable, doesn't change
	id_info    :: IdInfo }		-- Unstable, updated by simplifier
MatchResult

ExhaustivePat

From deSugar/Check.lhs:

type ExhaustivePat = ([WarningPat], [(Name, [HsLit])])
WarningPat
type WarningPat = InPat Name

From hsSyn/HsPat.lhs:

type InPat id  = LPat id	-- No 'Out' constructors
type LPat id = Located (Pat id)

From basicTypes/SrcLoc.lhs:

-- | We attach SrcSpans to lots of things, so let's have a datatype for it.
data Located e = L SrcSpan e
Name

From basicTypes/Name.lhs:

-- | A unique, unambigious name for something, containing information about where
-- that thing originated.
data Name = Name {
		n_sort :: NameSort,	-- What sort of name it is
		n_occ  :: !OccName,	-- Its occurrence name
		n_uniq :: FastInt,      -- UNPACK doesn't work, recursive type
--(note later when changing Int# -> FastInt: is that still true about UNPACK?)
		n_loc  :: !SrcSpan	-- Definition site
	    }
 
-- NOTE: we make the n_loc field strict to eliminate some potential
-- (and real!) space leaks, due to the fact that we don't look at
-- the SrcLoc in a Name all that often.

Understanding the current implementation

It's important to see what's currently done to see the problems that have already been solved, and that maybe are not solved by the new proposal.

check is a wrapper for check'

The check function is just a wrapper to a check' function, which receives the patterns in a canonical form. Notice that the original form of the patterns is preserved to inform the user if there's anything wrong in the same way it's written in the code.

From deSugar/Check.lhs:

It simplify the patterns and then call @check'@ (the same semantics), and it 
needs to reconstruct the patterns again ....
 
The problem appear with things like:
\begin{verbatim}
  f [x,y]   = ....
  f (x:xs)  = .....
\end{verbatim}
We want to put the two patterns with the same syntax, (prefix form) and 
then all the constructors are equal:
\begin{verbatim}
  f (: x (: y []))   = ....
  f (: x xs)         = .....
\end{verbatim}
(more about that in @tidy_eqns@)
 
We would prefer to have a @WarningPat@ of type @String@, but Strings and the 
Pretty Printer are not friends.
 
We use @InPat@ in @WarningPat@ instead of @OutPat@
nbecause we need to print the 
warning messages in the same way they are introduced, i.e. if the user 
wrote:
\begin{verbatim}
	f [x,y] = ..
\end{verbatim}
He don't want a warning message written:
\begin{verbatim}
        f (: x (: y [])) ........
\end{verbatim}
Then we need to use InPats.
\begin{quotation}
     Juan Quintela 5 JUL 1998\\
	  User-friendliness and compiler writers are no friends.
\end{quotation}
This equation is the same that check, the only difference is that the
boring work is done, that work needs to be done only once, this is
the reason top have two functions, check is the external interface,
@check'@ is called recursively.
check' :: [(EqnNo, EquationInfo)] 
	-> ([ExhaustivePat], 	-- Pattern scheme that might not be matched at all
	    EqnSet)  		-- Eqns that are used (others are overlapped)

A deeper look to check

The second return

The second return it returns is all the equation which are overlapped, that is, that are not possible to be matched. This can be seen by the comment in the check' function and the code of check, in deSugar/Check.lhs:

check qs = (untidy_warns, shadowed_eqns)
      where
	(warns, used_nos) = check' ([1..] `zip` map tidy_eqn qs)
	untidy_warns = map untidy_exhaustive warns 
	shadowed_eqns = [eqn | (eqn,i) <- qs `zip` [1..], 
				not (i `elementOfUniqSet` used_nos)]

check' returns the used equations, and check the ones that were not returned by check'.

The first return

The first return are a representation of the values that are not recognized, or that don't match any of the equations. This can be infered by the comment on the function check.

It seems that the first component of the tuple at ExaustivePat is the list of the related Patterns, and the second a list of the field and the values that are not recognized by the field. It's not clear yet what the Name is related to.

Trying to build with the rest of GHC

The file HACKING in the source code has good links for working with the GHC source.

Building the current GHC

A good tip in http://hackage.haskell.org/trac/ghc/wiki/Building/Hacking is to check for mk/build.mk. On this file there're some flavours of builds, and I chose the fastest one.

Another good tip was to use more processes with -j than the number of processors, since a lot of time spent on building is IO.

Now I'm trying to build the darcs version of ghc.

Building a version that does nothing but compiles correctly

module Check ( check , ExhaustivePat ) where
 
import HsSyn
import DsUtils
import Name
 
type WarningPat = InPat Name
type ExhaustivePat = ([WarningPat], [(Name, [HsLit])])
 
check :: [EquationInfo] -> ([ExhaustivePat], [EquationInfo])
check = undefined

Building

It was not trivial to build it. I first build the compiler as it was, and it worked. But when I changed this code and asked for the rebuild of this module only, I got errors:

Then I noticed that the code was being used in the stage 1 of the compiler, which is used to build the stage 2. As the build uses -Wall, the undefined was being reached. I'm not sure why, but even with -w the undefined is being reached, so I'll have to change the test.

Second try of building something

I'll just return two empty lists, meaning that there should never be a printed warning.

check = ([], [])

Which, of course, caused:

compiler/deSugar/Check.lhs:12:8:
    Couldn't match expected type `[EquationInfo]
                                  -> ([ExhaustivePat], [EquationInfo])'
           against inferred type `([a], [a1])'
    In the expression: ([], [])
    In the definition of `check': check = ([], [])

Duh! I corrected it to:

check qs = ([], [])

And it worked.

Talking to the GHC guys

Problem in Trac

I've received the e-mail with my lost password, but when I try to login I get this message:

The page isn't redirecting properly.

Iceweasel has detected that the server is redirecting the request for this
address in a way that will never complete.

This problem can sometimes be caused by disabling or refusing to accept
cookies.

I've mailed Igloo about it, but he said he had no time yet to check on this.

Owning the bug

I've read that I should “Take ownership of the bug in Trac”. So I changed the ownership of the bug, and Simon Peyton-Jones told me to read a paper and to study about View Patterns. I'll sure do it.

Test case

Now I should create a test case for the bug. I'll work on this.

Open bugs

The bugs I could find open in the GHC Trac are:

Task 595
322

Link to the bug

I could reproduce this bug and it seems to be a complicated bug to deal with, since it's related to the implicit conversion of types in the type class Num done by the compiler. To fix this, it would be necessary to have the already converted code passed to the Pattern Matching checker.

The code in it can be used to generate a test case.

I posted a message about this bug in it. I'll keep with the discussion there.

1307

Link to the bug

The only important thing I see in this bug is correcting the message to the user with a better syntax. I think it's unrelated with the rewrite task and could be done immediately, and it doesn't worth a test case.

2204

Link to the bug

This is the same bug as 1307.

Other Free Software Projects

Parent page: marco

 
pattern_matching_in_ghc.txt · Last modified: 2012/10/28 02:37 by newacct