I've quited doing this. The motivations are in the bug report.
I'm Writing about the problem of Pattern Matching.
The warnings for pattern matching in GHC are not being correctly shown. There're a lot of bugs reported on this theme, and they're grouped in Ticket 595.
There's a solution for it, and the GHC developers suggested that this paper should be implemented in GHC. I've written a sketch of how the code should work:
This is just the direct implementation of the theory in the referred article.
-- base import Control.Arrow (first) import Data.Maybe (mapMaybe, fromJust) data C = C0 | C1 P | C2 P P deriving (Eq, Show) -- for debugging data P = Wildcard | C C deriving (Eq, Show) -- for debugging newtype V = V [P] deriving (Eq, Show) -- for debugging newtype M = M [V] deriving (Eq, Show) -- for debugging -- Get the constructors present on a list of patterns catCs :: [P] -> [C] catCs ps = [c | C c <- ps] same :: C -> C -> Bool same C0 C0 = True same (C1 _p1) (C1 _p2) = True same (C2 _p11 _p12) (C2 _p21 _p22) = True same _c1 _c2 = False uRec :: M -> V -> Bool -- Nothing is matched, so the new vector is useful. uRec (M []) _q = True -- The vector is not useful, since it matches nothing. uRec _p (V []) = False uRec (M (V [] : _rest)) _q = False uRec p @ (M ps) q @ (V (q1 : qs)) = case q1 of Wildcard -> if isComplete σ then any specialized σ else uRec (d p) $ V qs -- In case the first column of the vector is a Constructor, we need to -- extract it from the vector and from the m. C c -> uRec (s c p) $ s c q where σ :: [C] σ = catCs $ map headV ps specialized :: C -> Bool specialized c = uRec (s c p) $ s c q headV :: V -> P headV (V v) = head v class S a where s :: C -> a -> a instance S M where s c (M m) = M $ mapMaybe (sV c) m instance S V where s = (.)(.)(.) fromJust sV sV :: C -> V -> Maybe V sV _c (V []) = error "sV _c []" sV c (V (Wildcard : ps)) = Just $ V $ map (const Wildcard) (extract c) ++ ps sV c (V (C c_ : cs)) | same c c_ = Just $ V $ extract c_ ++ cs | otherwise = Nothing extract :: C -> [P] extract C0 = [] extract (C1 p) = [p] extract (C2 p1 p2) = [p1, p2] isComplete :: [C] -> Bool isComplete σ = all present complete where present :: C -> Bool present cons = any (same cons) σ complete :: [C] complete = [C0, C1 undefined, C2 undefined undefined] d :: M -> M d (M m) = M $ mapMaybe dV m dV :: V -> Maybe V dV (V (Wildcard : ps)) = Just $ V ps dV _ = Nothing exhaustive :: M -> Bool exhaustive (M []) = False exhaustive m @ (M (V v : _vs)) = not $ uRec m $ V $ replicate (length v) Wildcard useful :: M -> Bool useful (M v) = all (uncurry uRec . first M) $ individuals v individuals :: [a] -> [([a], a)] individuals xs = map (remove xs) [0 .. pred $ length xs] remove :: [a] -> Int -> ([a], a) remove = remove_ [] remove_ :: [a] -> [a] -> Int -> ([a], a) remove_ _ [] _ = error "remove_ _ [] _" remove_ ys (x : _) 0 = (ys, x) remove_ ys (x : xs) i = remove_ (ys ++ [x]) xs $ pred i
This is working for the tests I've made. Now it's necessary to adapt this code to the GHC data types.
The function I have to make, I think, is:
check :: [EquationInfo] -> ([ExhaustivePat], [EquationInfo]) -- Second result is the shadowed equations -- if there are view patterns, just give up - don't know what the function is
This function is in the deSugar/Check.lhs file in the compiler directory of the source code of GHC.
This function takes the equations of a pattern and returns: \begin{itemize} \item The patterns that are not recognized \item The equations that are not overlapped \end{itemize}
From deSugar/DsMonad.hs:
data EquationInfo = EqnInfo { eqn_pats :: [Pat Id], -- The patterns for an eqn eqn_rhs :: MatchResult } -- What to do after match
From hsSyn/HsPat.lhs:
data Pat id = ------------ Simple patterns --------------- WildPat PostTcType -- Wild card -- The sole reason for a type on a WildPat is to -- support hsPatType :: Pat Id -> Type | VarPat id -- Variable | VarPatOut id (DictBinds id) -- Used only for overloaded Ids; the -- bindings give its overloaded instances | LazyPat (LPat id) -- Lazy pattern | AsPat (Located id) (LPat id) -- As pattern | ParPat (LPat id) -- Parenthesised pattern | BangPat (LPat id) -- Bang pattern ------------ Lists, tuples, arrays --------------- | ListPat [LPat id] -- Syntactic list PostTcType -- The type of the elements | TuplePat [LPat id] -- Tuple Boxity -- UnitPat is TuplePat [] PostTcType -- You might think that the PostTcType was redundant, but it's essential -- data T a where -- T1 :: Int -> T Int -- f :: (T a, a) -> Int -- f (T1 x, z) = z -- When desugaring, we must generate -- f = /\a. \v::a. case v of (t::T a, w::a) -> -- case t of (T1 (x::Int)) -> -- Note the (w::a), NOT (w::Int), because we have not yet -- refined 'a' to Int. So we must know that the second component -- of the tuple is of type 'a' not Int. See selectMatchVar | PArrPat [LPat id] -- Syntactic parallel array PostTcType -- The type of the elements ------------ Constructor patterns --------------- | ConPatIn (Located id) (HsConPatDetails id) | ConPatOut { pat_con :: Located DataCon, pat_tvs :: [TyVar], -- Existentially bound type variables (tyvars only) pat_dicts :: [id], -- Ditto *coercion variables* and *dictionaries* -- One reason for putting coercion variable here, I think, -- is to ensure their kinds are zonked pat_binds :: DictBinds id, -- Bindings involving those dictionaries pat_args :: HsConPatDetails id, pat_ty :: Type -- The type of the pattern } ------------ View patterns --------------- | ViewPat (LHsExpr id) (LPat id) PostTcType -- The overall type of the pattern -- (= the argument type of the view function) -- for hsPatType. ------------ Quasiquoted patterns --------------- -- See Note [Quasi-quote overview] in TcSplice | QuasiQuotePat (HsQuasiQuote id) ------------ Literal and n+k patterns --------------- | LitPat HsLit -- Used for *non-overloaded* literal patterns: -- Int#, Char#, Int, Char, String, etc. | NPat (HsOverLit id) -- ALWAYS positive (Maybe (SyntaxExpr id)) -- Just (Name of 'negate') for negative -- patterns, Nothing otherwise (SyntaxExpr id) -- Equality checker, of type t->t->Bool | NPlusKPat (Located id) -- n+k pattern (HsOverLit id) -- It'll always be an HsIntegral (SyntaxExpr id) -- (>=) function, of type t->t->Bool (SyntaxExpr id) -- Name of '-' (see RnEnv.lookupSyntaxName) ------------ Generics --------------- | TypePat (LHsType id) -- Type pattern for generic definitions -- e.g f{| a+b |} = ... -- These show up only in class declarations, -- and should be a top-level pattern ------------ Pattern type signatures --------------- | SigPatIn (LPat id) -- Pattern with a type signature (LHsType id) | SigPatOut (LPat id) -- Pattern with a type signature Type ------------ Pattern coercions (translation only) --------------- | CoPat HsWrapper -- If co::t1 -> t2, p::t2, -- then (CoPat co p) :: t1 (Pat id) -- Why not LPat? Ans: existing locn will do Type -- Type of whole pattern, t1 -- During desugaring a (CoPat co pat) turns into a cast with 'co' on -- the scrutinee, followed by a match on 'pat'
From basicTypes/Var.lhs:
type Id = Var
Every @Var@ has a @Unique@, to uniquify it and for fast comparison, a @Type@, and an @IdInfo@ (non-essential info about it, e.g., strictness). The essential info about different kinds of @Vars@ is in its @VarDetails@.
-- | Essentially a typed 'Name', that may also contain some additional information -- about the 'Var' and it's use sites. data Var = TyVar { varName :: !Name, realUnique :: FastInt, -- Key for fast comparison -- Identical to the Unique in the name, -- cached here for speed varType :: Kind, -- ^ The type or kind of the 'Var' in question isCoercionVar :: Bool } | TcTyVar { -- Used only during type inference -- Used for kind variables during -- inference, as well varName :: !Name, realUnique :: FastInt, varType :: Kind, tcTyVarDetails :: TcTyVarDetails } | Id { varName :: !Name, realUnique :: FastInt, varType :: Type, idScope :: IdScope, id_details :: IdDetails, -- Stable, doesn't change id_info :: IdInfo } -- Unstable, updated by simplifier
From deSugar/Check.lhs:
type ExhaustivePat = ([WarningPat], [(Name, [HsLit])])
type WarningPat = InPat Name
From hsSyn/HsPat.lhs:
type InPat id = LPat id -- No 'Out' constructors
type LPat id = Located (Pat id)
From basicTypes/SrcLoc.lhs:
-- | We attach SrcSpans to lots of things, so let's have a datatype for it. data Located e = L SrcSpan e
From basicTypes/Name.lhs:
-- | A unique, unambigious name for something, containing information about where -- that thing originated. data Name = Name { n_sort :: NameSort, -- What sort of name it is n_occ :: !OccName, -- Its occurrence name n_uniq :: FastInt, -- UNPACK doesn't work, recursive type --(note later when changing Int# -> FastInt: is that still true about UNPACK?) n_loc :: !SrcSpan -- Definition site } -- NOTE: we make the n_loc field strict to eliminate some potential -- (and real!) space leaks, due to the fact that we don't look at -- the SrcLoc in a Name all that often.
It's important to see what's currently done to see the problems that have already been solved, and that maybe are not solved by the new proposal.
The check function is just a wrapper to a check' function, which receives the patterns in a canonical form. Notice that the original form of the patterns is preserved to inform the user if there's anything wrong in the same way it's written in the code.
From deSugar/Check.lhs:
It simplify the patterns and then call @check'@ (the same semantics), and it needs to reconstruct the patterns again .... The problem appear with things like: \begin{verbatim} f [x,y] = .... f (x:xs) = ..... \end{verbatim} We want to put the two patterns with the same syntax, (prefix form) and then all the constructors are equal: \begin{verbatim} f (: x (: y [])) = .... f (: x xs) = ..... \end{verbatim} (more about that in @tidy_eqns@) We would prefer to have a @WarningPat@ of type @String@, but Strings and the Pretty Printer are not friends. We use @InPat@ in @WarningPat@ instead of @OutPat@ nbecause we need to print the warning messages in the same way they are introduced, i.e. if the user wrote: \begin{verbatim} f [x,y] = .. \end{verbatim} He don't want a warning message written: \begin{verbatim} f (: x (: y [])) ........ \end{verbatim} Then we need to use InPats. \begin{quotation} Juan Quintela 5 JUL 1998\\ User-friendliness and compiler writers are no friends. \end{quotation}
This equation is the same that check, the only difference is that the boring work is done, that work needs to be done only once, this is the reason top have two functions, check is the external interface, @check'@ is called recursively.
check' :: [(EqnNo, EquationInfo)] -> ([ExhaustivePat], -- Pattern scheme that might not be matched at all EqnSet) -- Eqns that are used (others are overlapped)
The second return it returns is all the equation which are overlapped, that is, that are not possible to be matched. This can be seen by the comment in the check' function and the code of check, in deSugar/Check.lhs:
check qs = (untidy_warns, shadowed_eqns) where (warns, used_nos) = check' ([1..] `zip` map tidy_eqn qs) untidy_warns = map untidy_exhaustive warns shadowed_eqns = [eqn | (eqn,i) <- qs `zip` [1..], not (i `elementOfUniqSet` used_nos)]
check' returns the used equations, and check the ones that were not returned by check'.
The first return are a representation of the values that are not recognized, or that don't match any of the equations. This can be infered by the comment on the function check.
It seems that the first component of the tuple at ExaustivePat is the list of the related Patterns, and the second a list of the field and the values that are not recognized by the field. It's not clear yet what the Name is related to.
The file HACKING in the source code has good links for working with the GHC source.
A good tip in http://hackage.haskell.org/trac/ghc/wiki/Building/Hacking is to check for mk/build.mk. On this file there're some flavours of builds, and I chose the fastest one.
Another good tip was to use more processes with -j than the number of processors, since a lot of time spent on building is IO.
Now I'm trying to build the darcs version of ghc.
module Check ( check , ExhaustivePat ) where import HsSyn import DsUtils import Name type WarningPat = InPat Name type ExhaustivePat = ([WarningPat], [(Name, [HsLit])]) check :: [EquationInfo] -> ([ExhaustivePat], [EquationInfo]) check = undefined
It was not trivial to build it. I first build the compiler as it was, and it worked. But when I changed this code and asked for the rebuild of this module only, I got errors:
Then I noticed that the code was being used in the stage 1 of the compiler, which is used to build the stage 2. As the build uses -Wall, the undefined was being reached. I'm not sure why, but even with -w the undefined is being reached, so I'll have to change the test.
I'll just return two empty lists, meaning that there should never be a printed warning.
check = ([], [])
Which, of course, caused:
compiler/deSugar/Check.lhs:12:8:
Couldn't match expected type `[EquationInfo]
-> ([ExhaustivePat], [EquationInfo])'
against inferred type `([a], [a1])'
In the expression: ([], [])
In the definition of `check': check = ([], [])
Duh! I corrected it to:
check qs = ([], [])
And it worked.
I've received the e-mail with my lost password, but when I try to login I get this message:
The page isn't redirecting properly. Iceweasel has detected that the server is redirecting the request for this address in a way that will never complete. This problem can sometimes be caused by disabling or refusing to accept cookies.
I've mailed Igloo about it, but he said he had no time yet to check on this.
I've read that I should “Take ownership of the bug in Trac”. So I changed the ownership of the bug, and Simon Peyton-Jones told me to read a paper and to study about View Patterns. I'll sure do it.
Now I should create a test case for the bug. I'll work on this.
The bugs I could find open in the GHC Trac are:
I could reproduce this bug and it seems to be a complicated bug to deal with, since it's related to the implicit conversion of types in the type class Num done by the compiler. To fix this, it would be necessary to have the already converted code passed to the Pattern Matching checker.
The code in it can be used to generate a test case.
I posted a message about this bug in it. I'll keep with the discussion there.
The only important thing I see in this bug is correcting the message to the user with a better syntax. I think it's unrelated with the rewrite task and could be done immediately, and it doesn't worth a test case.
This is the same bug as 1307.
Parent page: marco