Data-Oblivious Stream Productivity

Abstract

We are concerned with demonstrating productivity of specifications of infinite streams of data, based on orthogonal rewrite rules. In general, this property is undecidable, but for restricted formats computable sufficient conditions can be obtained. The usual analysis, also adopted here, disregards the identity of data, thus leading to approaches that we call data-oblivious. We present a method that is provably optimal among all such data-oblivious approaches. This means that in order to improve on the algorithm in this paper one has to proceed in a data-aware fashion.

Papers

  • Complexity of Fractran and Productivity
    Technical report, 2009.
    Accepted for publication in CADE-22.
    (arxiv)
  • Data-Oblivious Stream Productivity
    Logic for Programming, Artificial Intelligence, and Reasoning - LPAR 2008.
    Volume 5330 of Lecture Notes in Computer Science, pages 79-96.
    Springer, 2008.
    (PDF, PS)
  • Data-Oblivious Stream Productivity
    Technical report, 2008.
    (arxiv)
  • Productivity of Stream Definitions
    Preprint 268, Logic Group Preprint Series.
    Department of Philosophy, Utrecht University, 2008.
    Accepted for publication in a forthcoming special issue of TCS.
    (PDF)
  • Productivity of Stream Definitions
    16th Fundamentals of Computation Theory, 2007.
    (PDF, PS)

Research Proposals

  • Infinity, computing, reasoning and modeling with infinite objects
    2005
    (PDF).
  • Proving Equations for Cyclic Objects
    2005
    (PDF).
  • Lazy Productivity
    2009
    (PDF).

Data-Oblivious Optimality

For a large class of stream specifications our method is provably optimal among all data-oblivious approaches. Below we will elaborate on this class, the so-called `flat' stream specifications. Note that previous automatable criteria for productivity were data-oblivious, in particular Buchholz [1], Hughes, Pareto and Sabry [2], Telford and Turner [4], and Wadge [5].

For stream specifications we distinguish 3 layers:
  • a stream layer containing rules for stream constants
  • a function layer containing rules for stream functions
  • a data layer containing rules for data functions
For example:
Q = diff(M)
M = h(0:tail(M))
stream layer
tail(x:s) = s

h(0:s) = 0:1:h(s)
h(1:s) = 1:0:h(s)

diff(x:y:s) = abc(x,y):diff(y:s)
function layer
abc(0,0) = a
abc(0,1) = b
abc(1,0) = c
abc(1,1) = a
data layer
As usual in functional programming we consider only constructor TRSs. Furthermore we assume that the defining rules for every defined function symbol are exhaustive and that the data layer is terminating (strongly normalizing).

Definition: A stream specification is called flat if in the function layer there are no nested occurrences of stream function symbols.

Note that the format allows for a large class of stream functions and imposes no restrictions at all on how the stream layer makes use of the stream functions. In particular, the format allows to define all "automatic sequences" (see [Allouche, Shallit, 2003]).

Comparison with Previous Approaches

Even simple function specifications can exhibit a complex data-oblivious consumption/production behaviour:
f(s) = g(s,s)                             (1)
g(0:y:s, x:t) = 0:0:g(s,t)                (2)
g(1:s, x1:x2:x3:x4:t) = 1:1:1:1:1:g(s,t)  (3)
The following picture shows a selection of function call traces for f. In particular it shows the traces that contribute to the lower bound of the quantitative consumption/production behaviour of f.

Call traces for f that contribute to the lower bound. [(2) green, (3) blue]
Let us for example consider the situation where f is supplied with 8 input elements. The graph suggests that there is a (data-oblivious) call trace which produces not more than 7 elements and the trace consists of rule applications (1),(2),(3) and (3). Indeed we find a rewrite sequence:
f(0:1:1:1:0:0:0:0)
= g(0:1:1:1:0:0:0:0, 0:1:1:1:0:0:0:0)
= 0:0:g(1:1:0:0:0:0, 1:1:1:0:0:0:0)
= 0:0:1:1:1:1:1:g(1:0:0:0:0, 0:0:0)
producing only 7 elements and g(1:0:0:0:0, 0:0:0) being a normal form.

For proving productivity we are interested in the lower bound of the traces in such graphs, that is, the quantitative worst case consumption/production behaviour of stream functions.

In contrast to previous approaches which used (under-)approximations of the lower bound, we calculate and work with the exact data-oblivious lower bounds, which enables us to decide data-oblivious productivity. We have shown that the exact data-oblivious lower bounds of flat stream functions (involving pattern matching on data) can be computed and are periodically increasing.

Definition: A function f : N -> N is called eventually periodic if the sequence [f(0), f(1), f(2), ...] is eventually periodic. A function g : N -> N is called periodically increasing if it is non-decreasing and the derivative of g, the function g' : N -> N with g'(n) = g(n+1) - g(n), is eventually periodic.

Representing periodically increasing functions: We represent periodically increasing functions as pairs (prefix,cycle) of lists of {+,-} with non-empty cycle, where + stands for output and - for input. The production function π(p,c) associated to such a pair (p,c) is defined as follows: let l = pccc..., then π(p,c)(n) is the number of plus signs (from the start of l) to the (n+1)-th minus sign in l (or infinity if there are less than n+1 minus signs in l).

For convenience we write pairs (p,c) as pc. For example the identity function is represented by -+.

The class of periodically increasing functions generalizes the production moduli employed by Hughes, Pareto and Sabry [2], Telford and Turner [4], and Wadge [5]. The approaches have in common that min(...) is taken for r-ary functions with r > 1. Therefore it is sufficient to compare the unary production moduli:
  • Telford and Turner [4] use production moduli of the form
    • f(n) = n + a with a an integer
    These simple functions are obviously periodically increasing.
    In our representation they are of the form +a -+ for positive a or -(-a) -+ for negative a, respectively.
    These production moduli are not suitable to capture the behaviour of simple stream functions like:
    dup(x:s) = x:x:dup(s)
    even(x:y:s) = x:even(s)
    The quantitative behaviour of dup is f(n) = 2n, but the best bound that [4] can assign is f(n) = n.
    For even the situation is even worse. The behaviour of even is f(n) = n/2, but the best bound that [4] can assign is f(n) = -infinity, clearly a non-usable lower bound for even.

  • Hughes, Pareto and Sabry [2] employ production moduli of the form
    • f(n) = max {c*x + d | x in N, n >= a*x + b} with a,b,c,d in N
    These functions are strictly contained in the periodically increasing functions.
    In our representation they are of the form ps with p = -m+d and s = -a+c.
    Observe that their class of production moduli is not closed under composition. Consider for example division by 3 ---+ composed with muliplication by 2 -++, which yields ---+*-++ = --+-+.
    div3(x:y:z:s) = x:div3(s)
    mul2(x:s) = x:x:mul2(s)
    The result of the composition div3(mul2(...)), that is --+-+, falls out of their class, which implies the need of approximations and a loss of power.

  • Wadge [5] uses basically the same production moduli as Telford and Turner [4].
The following picture shows a comparison of lower bounds of the function f defined above.

Comparison of lower bounds.
Note that the class of production moduli of Hughes, Pareto and Sabry [2] allows for different (uncomparable) lower bounds on f, none of which coincides with the exact bound. For the picture we have chosen the bound of [2] that we considered closest to the exact bound.

Press "GO" to see that our algorithm indeed calculates the exact data-oblivious lower bound:

----++-++-+--++-+-++-


Productivity Tool

We have implemented the decision algorithm for data-oblivious productivity as a Haskell program. The program reads a given stream specification, automatically recognizes to class of the specification (pure/flat/other), and then attempts to prove or disprove productivity. As output a .pdf file is generated containing the (dis)proof of productivity or a message in case the attempt failed.


For further information about the Productivity Tool click here.

Thue-Morse (M) and Ternary Thue-Morse (Q)

As simple examples and demonstration of the online-version of the Productivity Tool we consider the Thue-Morse seqence M and the Ternary Thue-Morse sequence Q.

The (strongly) cube-free Thue-Morse seqence M:


An alternative definition of the Thue-Morse seqence M (derived from the D0L sequence definition):


The square-free Ternary Thue-Morse sequence Q:


References

  1. W. Buchholz. A Term Calculus for (Co-)Recursive Definitions on Streamlike Data Structures.
    Annals of Pure and Applied Logic, 136(1-2):75–90, 2005.
  2. J. Hughes, L. Pareto, and A. Sabry. Proving the Correctness of Reactive Systems using Sized Types.
    In POPL ’96, pages 410–423, 1996.
  3. B.A. Sijtsma. On the Productivity of Recursive List Definitions.
    ACM Transactions on Programming Languages and Systems, 11(4):633–649, 1989.
  4. A. Telford and D. Turner. Ensuring Streams Flow.
    In AMAST, pages 509-523, 1997.
  5. W.W. Wadge. An Extensional Treatment of Dataflow Deadlock.
    TCS, 13:3–15, 1981.