Last active
August 29, 2015 14:10
-
-
Save jobjo/fe31e10964ed5b1162ed to your computer and use it in GitHub Desktop.
Aggregating elements in F#
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| open System | |
| open System.Collections.Generic | |
| // The problem I'm addressing is aggregating elements from a tree. | |
| // In my particular scenario I'm interested in collecting all leaf nodes | |
| // into an array. The fundamental problem is captured by the different versions | |
| // of aggregate, listed below. The functions are merely simulating the branching | |
| // and aggregation steps. I'm not concerned about the order of the elements. | |
| // In the comments are the total times in milliseconds | |
| // running the functions with n = 1e6. That is generating an array of roughly | |
| // a million elements. | |
| // Using this for all testing | |
| let n = int 1e6 | |
| // All times reported below are retrieved by a simple: | |
| // #time | |
| // ignore <| aggregateX | |
| // #time | |
| // Perhaps the most intuitive solution is to use an array workflow. | |
| // Time: 1547 ms | |
| // Producing a million elements is about 1.5 seconds. | |
| let rec aggregate1 n = | |
| if n <= 0 then | |
| [||] | |
| else | |
| [| | |
| yield n | |
| yield! aggregate1 (n/2) | |
| yield! aggregate1 (n/2) | |
| |] | |
| // What about a list workflow? | |
| // Time: 3000 sm | |
| // Twice as bad as the array workflow. Note the extra Lit.toArray traversal. | |
| let aggregate2 n = | |
| let rec go n = | |
| if n <= 0 then | |
| [] | |
| else | |
| [ | |
| yield n | |
| yield! go (n/2) | |
| yield! go (n/2) | |
| ] | |
| List.toArray (go n) | |
| // Perhaps a sequence expression is better since it can build the list lazily | |
| // avoiding linear merging? | |
| // Time: 770 | |
| // Speedup comparing with with list and array workflows. | |
| let aggregate3 n = | |
| let rec go n = | |
| if n <= 0 then | |
| Seq.empty | |
| else | |
| seq { | |
| yield n | |
| yield! go (n/2) | |
| yield! go (n/2) | |
| } | |
| Seq.toArray <| go n | |
| // Simple list aggregation. As expected slow due to linear concatenation behavior. | |
| // Total time: 508 ms | |
| // Still faster than any of the above versions! | |
| let aggregate4 n = | |
| let rec go n = | |
| if n <= 0 then | |
| [] | |
| else | |
| n :: go (n/2) @ go (n/2) | |
| List.toArray <| go n | |
| // Passing around an accumulator avoid linear merging. | |
| // Time: 78 ms | |
| // Indeed much faster. | |
| let aggregate5 n = | |
| let rec go acc n = | |
| if n <= 0 then | |
| acc | |
| else | |
| go (go (n :: acc) (n/2)) (n/2) | |
| List.toArray <| go [] n | |
| // Another attempt composing functions to avoid linerar merging. | |
| // Time 169 ms | |
| // Still much better than basic list concatenation but twice as slow as | |
| // using an int accumulator. | |
| let aggregate6 n = | |
| let rec go n = | |
| if n <= 0 then | |
| id | |
| else | |
| fun xs -> n :: (xs |> go (n/2) |> go (n/2)) | |
| List.toArray <| go n [] | |
| // The above solution is not tail-recursive. Here is a tail recursive version. | |
| // Time: 732 | |
| // Passing around the continuation seems to cause a lot of overhead. | |
| let aggregate7 n = | |
| let rec go n k = | |
| if n <= 0 then | |
| k [] | |
| else | |
| go (n/2) <| fun left -> | |
| go (n/2) <| fun right -> | |
| k <| n :: left @ right | |
| List.toArray <| go n id | |
| // More hardcore. Using a (mutable) list to accumulate the elements. | |
| // Time: 10 ms | |
| // That's a serious improvement. About 8 times faster than using an list with an accumulator. | |
| let aggregate8 n = | |
| let v = List<_>() | |
| let rec go n = | |
| if n <= 0 then | |
| () | |
| else | |
| ignore <| v.Add(n) | |
| go (n/2) | |
| go (n/2) | |
| go n | |
| v.ToArray() | |
| // Can we do better? Maybe by cheating a little, pretending that we | |
| // roughly know the size of the produced array. | |
| // Time: 7ms | |
| // Not a whole lot faster. | |
| let aggregate9 n = | |
| let arr = Array.zeroCreate (2 * n) | |
| let ix = ref 0 | |
| let rec go n = | |
| if n <= 0 then | |
| () | |
| else | |
| arr.[!ix] <- n | |
| incr ix | |
| go (n/2) | |
| go (n/2) | |
| go n | |
| Array.sub arr 0 !ix | |
| // So, using a mutable list for accumulating the results and | |
| // avoid having to pass around extra parameters in the recursive calls | |
| // is by far the most efficeient option seen so far. Unfortunately it's the ugliest version. | |
| // Isn't it possible to capture this in a work-flow (hiding the mutable list). | |
| // The idea to work with functions from List<'T> -> unit (that's our workflow type). | |
| // Yielding an element is adding to the list. | |
| let inline result (x: 'T) (list: List<'T>) = list.Add(x) | |
| // Empty value doesn't add anything. | |
| let inline empty _ = () | |
| // Combine two computations. | |
| let inline combine l1 l2 list = | |
| l1 list | |
| l2 list | |
| // Create an array by running the computation on an empty list. | |
| let toArray lg = | |
| let list = List<_>() | |
| lg list | |
| list.ToArray() | |
| // Creating the computation expression builder. | |
| type ListGenBuilder() = | |
| member inline this.Return(x) = result x | |
| member inline this.Yield(x) = result x | |
| member inline this.Combine(l1,l2) = combine l1 l2 | |
| member inline this.Zero() = empty | |
| member inline this.Delay(f) = f () | |
| member inline this.YieldFrom(l) = l | |
| let listGen = ListGenBuilder() | |
| // Let's give it a spin. | |
| // Time: 402 ms | |
| // How disappointing. An order of magnitude slower than manual list manipulation. | |
| let aggregate10 n = | |
| let rec go n = | |
| if n <= 0 then | |
| empty | |
| else | |
| listGen { | |
| yield n | |
| yield! go (n/2) | |
| yield! go (n/2) | |
| } | |
| toArray <| go n | |
| // Does the work flow syntax create an overhead? What if we use the combinators directly? | |
| // Time: 329 ms | |
| // Not much faster. The culprit is creating the functions objects | |
| // in each recursive call. | |
| let aggregate11 n = | |
| let rec go n = | |
| if n <= 0 then | |
| empty | |
| else | |
| combine (combine (result n) (go (n/2))) (go (n/2)) | |
| toArray <| go n | |
| // To conclude - Substantial differences, sequence builders with yield! is slow. | |
| // Using a mutable list for pushing elemeents outperforms any other attempt | |
| // and unfortunately I was not able to abstract this pattern. Is there any better way? |
Jand42
commented
Dec 4, 2014
Author
Interesting. Keeping the state in the workflow object itself is not very transparent though since it requires the user to know about the special semantics, that is workflow objects may never be reused.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment