A Cost-Augmented Partial Evaluator for the Multi-Paradigm Declarative Language Curry

(Version v1.3 of 01/06/2004)

This is an extension of the partial evaluator for the multi-paradigm language Curry. It has been implemented in Curry itself. The original narrowing-driven partial evaluator has been enhanced with the computation of symbolic costs. They are "symbolic" in the sense that they measure the number of basic operations performed during a computation rather than actual execution times. We also incorporate a simple speedup analysis which allows us to determine the global improvement achieved by the specialization process.

The enhanced specializer is able to deal with programs containing local declarations, higher-order functions, and several built-in functions. The system considers programs written in Curry and translates them to an intermediate language, FlatCurry, in order to specialize them. This is essential to express the above language features at an appropriate level of abstraction. In this way, the resulting tool can be used to specialize "real" Curry programs.

Let us consider a simple example to illustrate the behavior of the developed tool. Consider the well-known benchmark "all_ones" (a typical example for deforestation techniques):

allones x = case x of 
               { Z     -> [] ;
                 (S y) -> 1 : allones y }    

length x = case x of 
              { []     -> Z ;
                (y:ys) -> S (length ys) }    
The partial evaluation of this program w.r.t. the initial call
  allones (length x)
returns the following residual program:
allones_pe x = case x of
                  { []     -> [] ;
                    (y:ys) -> 1 : allones_pe ys }    
which is clearly more efficient than the original one. Here, "allones_pe x" is a renaming for "allones (length x)". The enhanced partial evaluator additionally returns the following associated costs for this residual rule:
Original: Alt (Cost 2 2 1 0 1) (Cost 2 2 8 0 1)
Residual: Alt (Cost 1 1 1 0 1) (Cost 1 1 6 0 1)
where the "Alt" construct is used to structure the costs of the different case branches, and the "Cost" construct contains five elements which represent the number of function unfoldings, the number of case evaluations (to measure the pattern matching effort), the number of (first-order) applications, the number of higher-order applications, and the number of non-deterministic branching points, respectively.

Thus, for each recursive call of the residual function "allones_pe" we perform: one function unfolding, one case evaluation, 6 applications, and one non-deterministic branching, while the equivalent computation in the original program requires: two function unfoldings, two case evaluations, 8 applications, and one non-deterministic branching.

Apart from its ability to perform deforestation, a powerful optimization achieved by the partial evaluator is the transformation of higher-order functions into first-order functions. This reduces the execution time and space requirements w.r.t. existing implementations. Let us consider, for instance, the following (higher-order) program:

map f xs = case xs of { []     -> [] ;
                        (y:ys) -> apply f y : map f ys }    

foldr f z xs = case xs of { []     -> z ;
                            (y:ys) -> apply (apply f y) (foldr f z ys) }    
implementing the usual higher-order functions "map" and "foldr". Cost-augmented partial evaluation w.r.t. the call "foldr (+) 0 (map (+1) xs)" returns the following residual program:
foldr_pe xs = case xs of { []     -> 0 ;
                           (y:ys) -> (y + 1) + foldr_pe ys }
with associated costs:
Original: Alt (Cost 2 2 1 0 1) (Cost 3 2 11 3 1)
Residual: Alt (Cost 1 1 1 0 1) (Cost 1 1 8 0 1)
where "foldr_pe xs" is a renaming for "foldr (+) 0 (map (+1) xs)a". Observe that the residual program is now first-order. In particular, if we consider the recursive branch of function foldr_pe, we can see that there are several improvements: 3/1 = 3 for the number of function unfoldings, 2/1 = 2 for case evaluations, 11/8 = 1.37 for applications, and 3/0 for the number of higher-order applications (i.e., all higher-order applications have been removed). In these examples, it is fairly easy to analyze the cost improvement achieved by narrowing-driven partial evaluation. In more complex examples, though, it is not so easy to analyze the global improvement from the cost variation achieved by each residual rule. To overcome this problem, we have included a simple speedup analysis in our partial evaluation tool. It computes all the possible loops in the residual program, starting from the outermost function symbol of the initial call, and then sums up their associated costs. It proceeds, basically, by constructing a dependency graph for the function symbol of the partially evaluated call, thus it gives only approximate results.

A complete description of cost-augmented narrowing-driven specialization can be found in [Vid04].

The source files of the enhanced partial evaluator are in pevalcost.tar.gz. It contains pevalcost.curry, a modified file from the PAKCS library, Flat.curry, which should be in the same directory as pevalcost.curry, a reduced version of the Curry prelude, xprelude.curry, and a number of (annotated) examples. PAKCS version 1.6.0 is required. As in the original partial evaluator, expressions to be partially evaluated must be marked in the source program by (PEVAL ...), where "PEVAL" is the identity function (defined in the standard prelude). For instance, you can load in the enhanced partial evaluator as follows:

prelude> :l pevalcost
Then, to partially evaluate the above example "allones", you can type
pevalcost> main "allones"
where allones.curry is the source file (included in pevalcost.tar.gz). For this example, the output of the specialization process is (with some pretty-printing) as follows:
pevalcost> Cost-Augmented Partial Evaluator for Curry (v1.3 - 01/06/2004)
(TU Valencia)

Annotated expressions to be partially evaluated:

allones (length x)

Starting pevalcost for program "allones"...

Independent Renaming:

allones (length x)                  -> allones_pe0 x

Case (length x) of
   (Z    -> [] 
   (S y) -> 1 : allones_allones y   -> case_pe1 x

Specialized Program:

allones_pe0 x = case_pe1 x

Original: (Cost 1 0 0 0 0)
Residual: (Cost 1 0 0 0 0)

case_pe1 x = Case x of
               []     -> []
               (y:ys) -> 1 : allones_pe0 ys

Original: (Alt (Cost 1 2 1 0 1) (Cost 1 2 8 0 1) )
Residual: (Alt (Cost 1 1 1 0 1) (Cost 1 1 6 0 1) )

After Post-unfolding:

allones_pe0 x = Case x of
                  []     -> []
                  (y:ys) -> 1: allones_pe0 ys

Original: (Alt (Cost 2 2 1 0 1) (Cost 2 2 8 0 1) )
Residual: (Alt (Cost 1 1 1 0 1) (Cost 1 1 6 0 1) )

Speedup Analysis:

Loop1(orig): (Cost 2 2 8 0 1)
     (spec): (Cost 1 1 6 0 1)

Writing specialized program into "allones_pe.flc"...

Please report any bug or comment to gvidal@dsic.upv.es.


Last update June 2004 # gvidal (at) dsic (dot) upv (dot) es