Skip to content

Instantly share code, notes, and snippets.

@jobjo
Last active August 29, 2015 14:10
Show Gist options
  • Select an option

  • Save jobjo/fe31e10964ed5b1162ed to your computer and use it in GitHub Desktop.

Select an option

Save jobjo/fe31e10964ed5b1162ed to your computer and use it in GitHub Desktop.
Aggregating elements in F#
open System
open System.Collections.Generic
// The problem I'm addressing is aggregating elements from a tree.
// In my particular scenario I'm interested in collecting all leaf nodes
// into an array. The fundamental problem is captured by the different versions
// of aggregate, listed below. The functions are merely simulating the branching
// and aggregation steps. I'm not concerned about the order of the elements.
// In the comments are the total times in milliseconds
// running the functions with n = 1e6. That is generating an array of roughly
// a million elements.
// Using this for all testing
let n = int 1e6
// All times reported below are retrieved by a simple:
// #time
// ignore <| aggregateX
// #time
// Perhaps the most intuitive solution is to use an array workflow.
// Time: 1547 ms
// Producing a million elements is about 1.5 seconds.
let rec aggregate1 n =
if n <= 0 then
[||]
else
[|
yield n
yield! aggregate1 (n/2)
yield! aggregate1 (n/2)
|]
// What about a list workflow?
// Time: 3000 sm
// Twice as bad as the array workflow. Note the extra Lit.toArray traversal.
let aggregate2 n =
let rec go n =
if n <= 0 then
[]
else
[
yield n
yield! go (n/2)
yield! go (n/2)
]
List.toArray (go n)
// Perhaps a sequence expression is better since it can build the list lazily
// avoiding linear merging?
// Time: 770
// Speedup comparing with with list and array workflows.
let aggregate3 n =
let rec go n =
if n <= 0 then
Seq.empty
else
seq {
yield n
yield! go (n/2)
yield! go (n/2)
}
Seq.toArray <| go n
// Simple list aggregation. As expected slow due to linear concatenation behavior.
// Total time: 508 ms
// Still faster than any of the above versions!
let aggregate4 n =
let rec go n =
if n <= 0 then
[]
else
n :: go (n/2) @ go (n/2)
List.toArray <| go n
// Passing around an accumulator avoid linear merging.
// Time: 78 ms
// Indeed much faster.
let aggregate5 n =
let rec go acc n =
if n <= 0 then
acc
else
go (go (n :: acc) (n/2)) (n/2)
List.toArray <| go [] n
// Another attempt composing functions to avoid linerar merging.
// Time 169 ms
// Still much better than basic list concatenation but twice as slow as
// using an int accumulator.
let aggregate6 n =
let rec go n =
if n <= 0 then
id
else
fun xs -> n :: (xs |> go (n/2) |> go (n/2))
List.toArray <| go n []
// The above solution is not tail-recursive. Here is a tail recursive version.
// Time: 732
// Passing around the continuation seems to cause a lot of overhead.
let aggregate7 n =
let rec go n k =
if n <= 0 then
k []
else
go (n/2) <| fun left ->
go (n/2) <| fun right ->
k <| n :: left @ right
List.toArray <| go n id
// More hardcore. Using a (mutable) list to accumulate the elements.
// Time: 10 ms
// That's a serious improvement. About 8 times faster than using an list with an accumulator.
let aggregate8 n =
let v = List<_>()
let rec go n =
if n <= 0 then
()
else
ignore <| v.Add(n)
go (n/2)
go (n/2)
go n
v.ToArray()
// Can we do better? Maybe by cheating a little, pretending that we
// roughly know the size of the produced array.
// Time: 7ms
// Not a whole lot faster.
let aggregate9 n =
let arr = Array.zeroCreate (2 * n)
let ix = ref 0
let rec go n =
if n <= 0 then
()
else
arr.[!ix] <- n
incr ix
go (n/2)
go (n/2)
go n
Array.sub arr 0 !ix
// So, using a mutable list for accumulating the results and
// avoid having to pass around extra parameters in the recursive calls
// is by far the most efficeient option seen so far. Unfortunately it's the ugliest version.
// Isn't it possible to capture this in a work-flow (hiding the mutable list).
// The idea to work with functions from List<'T> -> unit (that's our workflow type).
// Yielding an element is adding to the list.
let inline result (x: 'T) (list: List<'T>) = list.Add(x)
// Empty value doesn't add anything.
let inline empty _ = ()
// Combine two computations.
let inline combine l1 l2 list =
l1 list
l2 list
// Create an array by running the computation on an empty list.
let toArray lg =
let list = List<_>()
lg list
list.ToArray()
// Creating the computation expression builder.
type ListGenBuilder() =
member inline this.Return(x) = result x
member inline this.Yield(x) = result x
member inline this.Combine(l1,l2) = combine l1 l2
member inline this.Zero() = empty
member inline this.Delay(f) = f ()
member inline this.YieldFrom(l) = l
let listGen = ListGenBuilder()
// Let's give it a spin.
// Time: 402 ms
// How disappointing. An order of magnitude slower than manual list manipulation.
let aggregate10 n =
let rec go n =
if n <= 0 then
empty
else
listGen {
yield n
yield! go (n/2)
yield! go (n/2)
}
toArray <| go n
// Does the work flow syntax create an overhead? What if we use the combinators directly?
// Time: 329 ms
// Not much faster. The culprit is creating the functions objects
// in each recursive call.
let aggregate11 n =
let rec go n =
if n <= 0 then
empty
else
combine (combine (result n) (go (n/2))) (go (n/2))
toArray <| go n
// To conclude - Substantial differences, sequence builders with yield! is slow.
// Using a mutable list for pushing elemeents outperforms any other attempt
// and unfortunately I was not able to abstract this pattern. Is there any better way?
@jobjo
Copy link
Author

jobjo commented Dec 4, 2014

Interesting. Keeping the state in the workflow object itself is not very transparent though since it requires the user to know about the special semantics, that is workflow objects may never be reused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment