AND without pattern-matching
O(n^2) time. A more sophisticated code has O(n*logn) time complexity. The article contains the full code to
arrange a sequence in a complete binary tree and to extract an
arbitrary element from the tree maintaining the completeness of the
remaining tree.perfect-shuffle.txt [plain text file]
The article with the full source code for the two
implementations, including the code for building and deconstructing a
complete binary tree. It was originally posted as Provably
perfect shuffle algorithms on a newsgroup comp.lang.functional on Mon, 3 Sep 2001 13:59:46 -0700
Dave Bayer, Persi Diaconis: Trailing the dovetail shuffle to its lair
Ann. Appl. Probab. 2 (1992), no. 2, pp. 294-313.
In the context of card shuffling, `perfect shuffle' means
a deterministic interlacing of the two halves of the deck.
The paper studies the perfect and other card shuffles in detail.
Cycles certainly make it difficult to transform graphs in a pure non-strict language. Cycles in a source graph require us to devise a way to mark traversed nodes -- however we cannot mutate nodes and cannot even compare nodes with a generic ('derived') equality operator. Cycles in a destination graph call for keeping track of already constructed nodes so we can make back-references. An obvious solution is to resort to a state monad or IORefs. There is also a monad-less solution, which involves maintaining a dictionary of already constructed nodes. This solution is less obvious: seemingly we cannot add a node to the dictionary until we have built the node. This fact means that we cannot use the updated dictionary when building the descendants of the node -- which need the updated dictionary to link back. The problem can be overcome however with a credit card transform (a.k.a. "buy now, pay later" transform). To avoid hitting the bottom, we just have to "pay" by the "due date".
For illustration, the article below considers the problem of printing out a non-deterministic finite automaton (NFA) and transforming it into a deterministic finite automaton (DFA). Both NFA and DFA are represented as cyclic graphs.
CCard-transform-DFA.lhs [10K]
A literate Haskell code expounding the technique. The code was first posted in a message Transformations of cyclic graphs [Was: Efficiency of list indices in definitions] on Thu, 08 Aug 2002 16:11:30 -0700 on the Haskell-Cafe mailing list.
Tying the knot: How to build a cyclic data structure
<http://haskell.org/wiki/wiki?TyingTheKnot>
A CommonHaskellIdioms Haskell Wiki page.
Trying to tie the knot
<http://www.mail-archive.com/haskell@haskell.org/msg10687.html>
A message on the Haskell mailing list about a fixed-point combinator helping to tie a knot with forward links. Posted on Fri, 12 Apr 2002 18:44:07 -0700.
The fold-stream article below is about traversing collections, including files, databases and generating functions. The article introduces a non-recursive left-fold and argues that it is an optimal traversal interface in a language without first-class continuations. Indeed, given the non-recursive left-fold we can:
Both instantiation procedures are generic, as evidenced by their polymorphic types.
The article arose during the discussion of a good database
interface for Oracle. For portability, the article illustrates the
enumerator inversion technique on an example of a file considered a
collection of characters. Haskell provides a stream interface to that
collection: hGetChar. We implement a left fold
enumerator. We then turn that enumerator back into a stream: we
implement a function myhgetchar only in terms
of the left fold enumerator.
All the code in the article is in Haskell98. No unsafe operations are used.
fold-stream.lhs [14K]
The article as a well-commented literate Haskell code. An earlier version was posted on the Haskell mailing list
on Tue, 23 Sep 2003 23:59:45 -0700 (PDT).
Towards the best collection traversal interface
The extended abstract of a LL3 presentation. It lists
advantages of enumerators for traversing collections and discusses the
enumerator inversion procedure in a broad context.
Deforestation is usually defined as the elimination of an intermediate data structure, often a collection of items, sent from a single producer to a single consumer. Jorge Adriano showed an interesting example of one producer feeding two independent consumers. The two streams of data are mutually dependent. Furthermore, the rate of production is non-uniform: Generating an element in one stream may require generating trillion items in the other stream. If the first consumer is being evaluated, that consumer causes the evaluation of the producer until the latter yields an item. The trillion of items incidentally produced in the other stream have to be stored somewhere. In more OS-centric terms, the evaluator can't generally reduce two expressions in parallel. If one stream consumer is running, the other consumer sits in the ready queue. Therefore, data sent to the second consumer must be buffered, sometimes at great expense.
We will consider the problem of one producer and two consumers in the general setting. We derive a deforested version, which no longer needs to buffer any produced items. When we run that version, the memory retaining profile shows no space leaks. We also discuss a ``parallel'' writing of several streams into two distinct files. Our solution is safe and yet the i/o is effectively interleaved. The deforested solution is derived, via a sequence of equivalent transformations. In fact, the derived code worked on the first try.
de-biforestation.lhs [12K]
The article as a well-commented literate Haskell code. An earlier version was posted on the Haskell-Cafe mailing list
on Tue, 3 Dec 2002 22:05:05 -0800 (PST).
Jorge Adriano: producing and consuming lists
A message posted on the Haskell-Cafe mailing list
on Tue, 5 Nov 2002 16:04:58 +0000. The message describes the problem
of consuming two inter-dependent streams while avoiding space leaks.
We consider three examples of such transformations, including two pure functional implementations of a bubble-sort. The imperative code is actually quite complex: nested loops, data dependencies across iterations, reading from and writing into the array, conditional branches that depend on data and the iteration history, control dependencies across nested loops. We show two solutions in Haskell. Both use fast, imperative STArrays; both try to separate the imperative part from a functional part. The imperative part is performed "in a monad" while the functional part is written in ordinary Haskell. The two solutions are closely related, and written to emphasize their similarity. One of the solution is an interpreter for a trivial "virtual machine".
The analysis of data and control dependencies across loop iterations, scheduling of instructions and interlocking of their results, pre-fetching of data -- is the job of an advanced compiler or the CPU. On the other hand, we have to do the same kind of analyses to separate the functional part of a computation from its imperative part. It appears functional programming is closer to "the bare metal" as one would have thought.
The assertion that a dependency analysis of imperative array code is related to extracting a functional program from an imperative one has been stated by Andrew W. Appel, in his paper "SSA is Functional Programming". The present article has shown several illustrations of the thesis. We have done manual dependency analyses of several code snippets that update arrays in place. The analyses let us extract pure-functional parts of imperative programs. The rest requires serialization and can be handled by an ST monad.
The article with the full source code [plain text file]
The article itself. It was originally posted as FP vs assembly programming for super-scalar CPUs [Was: Nested Loops in Clean] on a newsgroup comp.lang.functionalon Thu, 16 Aug 2001 18:24:24 -0700
Andrew W. Appel SSA is Functional Programming
ACM SIGPLAN Notices v. 33, no. 4, pp. 17-20, April 1998.
Takusen is a joint project with Alistair Bayley. Takusen is a DBMS access library. It has a backend for Oracle on Unix, Linux or Windows via OCI, and a stub to test the library without any database. The infrastructure and the stub let one interface natively with other databases.
The distinguished feature of Takusen is processing query
results using a left-fold enumerator. The user supplies an iteratee
function, which receives rows one-at-a-time from the result-set. The
number of the arguments to the iteratee is the number of the columns in the
result-set, plus the seed. Each column in the result-set has
its own Haskell type. The latter could be a Maybe type if the
particular iteratee wishes to process NULLs.
The benefits are: more convenient and intuitive enumeration, iteration, and accumulation (see tests for examples); the retrieved data are not merely strings but have Haskell types: Int, Float, Date, etc.; buffer preallocation; pre-fetching; support for both enumerators and cursors, proper handling of all errors including various IO errors. No unsafe operations are used.
Takusen is a part of the Haskell-libs project at SourceForge.
The problem is adding memoization to a recursive function
Int -> Int. That seems simple, until we realize that
finding the value of that particular function at the argument
x requires evaluating the function for smaller arguments
and may require evaluating the function for larger arguments. Thus
straight-forward memoization leads to a large and sparsely populated
memoization table. As the original poster observed, the memoized
function begins thrashing sooner than the original code.
The solution is to balance the cost of recomputing the value
versus the cost of maintaining the memoization table and doing the
lookup. It is worth memoize the values of the function for some small
x -- which are being used frequently. For larger x, re-computation is better. The parameter sc_upb in
the code determines the balance, the largest argument value for which
we still memoize. The simple benchmark shows that this controlled
memoization does improve both the space and time performances.
A notable feature of the code is the transparency of adding the controlled memoization. The body of the algorithm remains essentially the same.
This is a simple trick of making a function strict in some or all of its arguments, without affecting the body of the function.
iterate2 f x n | seq f $ seq x $ seq n $ False = undefined
iterate2 f x n = --- as before
That is, we add one simple line to the definition of the
function, without changing the proper body of the function at all. We
prepend an extra clause with the guard that always yields False. Before failing, however, the guard forces the evaluation
of the selected arguments of the function. If needed, seq
can be replaced with deepSeq.
Making (some of the) arguments of a function strict can make
the function run in constant space, and so an iteration (Runge-Kutta
iteration in the above example) can proceed without running out
of (stack) space. The function iterate2 (and a similar
function rk4Next) were applied to arguments like
(n-1) and y + a1/6 -- which is a red
flag. These are exactly the kinds of expressions that lead to memory
exhaustion. Perhaps it is because the size of an unevaluated thunk for
(n-1) is so much bigger than the size of the evaluated
thunk. It seems that arithmetic expressions are the best candidates
for some opportunistic evaluation...
Re: How do I get a long iteration to run in constant space
<http://www.haskell.org/pipermail/haskell-cafe/2004-October/006983.html>
Discussion thread on the Haskell-Cafe mailing list, where the
message was originally posted on Tue, 5 Oct 2004 00:44:40 -0700 (PDT)
Partial signatures
Another application of the trick of adding a clause with an
always failing guard
We demonstrate an executable model of the evaluation of normal logic programs, i.e., of resolving Horn clauses presented in the form of definitional trees. Our implementation, DefinitionTree, is yet another embedding of Prolog in Haskell. It is distinguished not by speed or convenience. Rather, it is explicitly designed to formalize evaluation strategies such as SLD and SLD-interleaving, to be easier to reason about and so help prove termination and other properties of the evaluation strategies. The main difference of DefinitionTree from other embeddings of Prolog in Haskell is the absence of name-generation effects. We need neither gensym nor the state monad to ensure the unique naming of logic variables. Since naming and evaluation are fully separated, the evaluation strategy is no longer concerned with fresh name generation and so is easier to reason about equationally.
DefinitionTree.hs [14K]
The commented source code
Pure, declarative, and constructive arithmetic relations
DefinitionTree has been used to describe SLD and
SLD-interleave evaluation strategies and to prove the basic properties
of solution sets.
We present an efficient library for writing CGI and FastCGI programs. Our goal is the least latency and the least memory consumption. The library underlies a production web application server Metcast. The server has been running since February 2007, sending large amounts of text and binary data in response to a continuous stream of requests. The server processes a request in constant and small memory imposing little latency, encoding and sending to the client chunks of data as soon as the database delivers them. The server handles hundreds of such requests without allocating memory; in fact, it uses only one 16KB buffer for all of its I/O. With the exception of an occasional rank-2 type and extended pattern guards, all of the code is in Haskell98. Not a single unsafeHaskell function occurs in the code.
The library is originally based on the code written by Jesse Tov, who in turn used NewCGI by Bjorn Bringert et al. Most of the code has been re-written. The biggest change is minimizing the amount of state. The library is centered around generalized input and output ports: a simple IO interface to read, write and copy via a single, large, once allocated buffer. The buffer and its dimensions are hidden, to discourage aliasing. We use this interface for the IO to and from the client, to and from files or the PostgreSQL database, and for reading from external servers.
A generalized input port is a procedure to read at most the
specified number of bytes into a given buffer. The procedure should
return the number of bytes actually read, or 0 on EOF. It may throw
various exceptions if reading didn't go well. The procedure is like
hGetBuf partially applied to a handle.
newtype Input = Input (forall m. EMonadIO m => Ptr Word8 -> Int -> m Int)
inputFd :: Fd -> Input
inputStr :: EMonadIO m => String -> m Input
inputCombined :: EMonadIO m => Input -> Input -> m Input
One can construct an Input from a Posix file descriptor
or a String. One can combine two Inputs into
one, which reads from the first generalized port until EOF and then
reads from the second. The frequently mentioned EMonadIO is a class of monads permitting IO along with throwing and catching of arbitrary errors. The IO monad and most of its transformations
are in that class. EMonadIO lets us write
gthrow, gcatch, gbracket, etc.
without even thinking of the current monad.
Dually, a generalized output port is a procedure to write the specified number of bytes from the given buffer. It may throw various exceptions if writing didn't go well.
newtype Output = Output (forall m. EMonadIO m => Ptr Word8 -> Int -> m ())
outputFd :: Fd -> Output
newtype BCopy = BCopy (forall m. EMonadIO m =>
Input -> Output -> Maybe Int -> m (Maybe Int))
The CGI monad exports (Fast)CGI output and error streams as Output, and lets us access request content as Input. We can use the generalized ports for reading and writing
strings. The ports are intended though to be connected via BCopy, which copies from Input to Output the desired number of bytes or till EOF.NewerCGI.hs [29K]
The commented source code of the library
for writing CGI and FastCGI programs
The code was first mentioned in a message Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen] on the Haskell-Cafe mailing list on
Fri, 27 Jul 2007 20:34:29 -0700 (PDT)
FastCGI.hsc [3K]
Bindings to the FastCGI C client library
MySysOpen module offers a reliable, proven way of interacting
with another local or remote process via a unidirectional or
bidirectional channel. It supports pipes and Unix and TCP sockets.
MySysOpen is a simple and explicit alternative to
the multi-threaded IO processing of the GHC run-time
system. The module is the Haskell binding to
sys_open -- the extended, user-level file opening interface.
The second half of MySysOpen.hs contains several
bi-directional channel interaction tests. One checks repeated
sending and receiving of data; the amount of received data is
intentionally large, about 510K. Two other tests interact with
programs that are not specifically written for interactive use,
such as sort. The latter cannot produce any
output before it has read all of the input, accepting no input terminator
other than the EOF condition. One test uses shutdown to set
the EOF condition. The other test programs the handler
for a custom EOF indicator, literally in the file name of
the communication pipe.
MySysOpen.hs [7K]
The code for the sys_open interface and tests
sys_open.c [20K]
The complete, well-commented code for sys_open
The following code is a space-efficient implementation of the
game Mastermind. It is efficient not in terms of the heap or stack
space but in the source text space. Jan Rochel wrote the original
code and put it in his e-mail signature. He asked for a shorter
version. The present code relies on a different algorithm to compute
exact and inexact matches in two strings. The new algorithm made the
code a bit faster, but more importantly, a bit (7%) shorter. The new
code is also more obscure and exhibits some interesting patterns, in
particular f x=(x\\).(x\\).
A version to put in a .signature (285 chars):
import Random;import List;import Char;p=putStrLn;u=uncurry;f x=(x\\).(x\\)
main=mapM(\x->randomRIO(49,54))[1..4]>>=n 0.map chr>>=p.("Tries: "++).show
e=((partition$u(==)).).zip;h(p,q)=['*'|x<-p]++['+'|x<-(u f)$unzip q]
n a s=getLine>>=m where{m i|i==s=return a;m i=p(h$e i s)>>n(a+1)s}
An expanded version that is easier to read:
import Random;import List;import Char
main = mapM(\x->randomRIO(49,54))[1..4]
>>= n 0 . map chr >>= p . ("Tries: "++) . show
p=putStrLn
u=uncurry
e=((partition$u(==)).) . zip
f x=(x\\).(x\\)
h(p,q)=['*'|x<-p] ++ ['+'|x<-(u f) $ unzip q]
n a s=getLine>>=m a s
m a s i | i==s = return a
m a s i = p(h$e i s) >> n(a+1)s
The code runs on GHC(i).comp.lang.functional on Sat, 24 May 2003 00:04:40 -0700AND without pattern-matchingAND, i.e., (&&), without explicit or even implicit pattern-matching? Note that the functions not and (||) are defined in the Prelude with the use of pattern-matching. The if function may rely on pattern-matching as well. The article below gives several answers.
comp.lang.functional on Fri, 13 Jul 2001 17:45:22 -0700This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML