The brigand implementation of remove_if is essentially a transform turning every element T into either a list<T> or a list<> and a join which removes the lists again resulting in a list of only the elements we would like to keep.

template <typename L, typename Pred> struct remove_if; template <template <class...> class L, typename... Ts, typename Pred> struct remove_if<L<Ts...>, Pred> : ::brigand::detail::append_impl<L<>, typename ::brigand::detail::empty_if_true<Pred, Ts, true>::type...> {};

Thinking about the speed of this algorithm it is immediately apparent that we are creating a list<T> type for each distinct T which we want to keep and N/fast track size intermediary types in our join. This is a lot of intermediary type overhead which we would like to eliminate.

The naïve implementation of remove_if would mean recursively iterating over the input list and conditionally adding that element to an output list. This is not directly possible with meta monads because we only have one list and unpacking is expensive because it creates intermediary types. We can however use a trick. Since we know how many elements were originally in the list we can iterate over exactly that many adding the ones we want to keep to the back of the list.

template<> struct generator<13> { //true case template<typename Control, int I, template<typename...> class F, typename T0, typename T1, typename...Ts> using f = typename generator<Control::select(I-1, F<T1>::value)>::template f < Control, I - 1, F, T1, Ts..., T0>; }; template<> struct generator<12> { //false case template<typename Control, int I, template<typename...> class F, typename T0, typename T1, typename...Ts> using f = typename generator<Control::select(I-1, F<T1>::value)>::template f < Control, I - 1, F, T1, Ts...>; }; template<> struct generator<11> { //done true case template<typename Control, int I, template<typename...> class F, typename T0, typename...Ts> using f = typename Control::template done<Ts...,T0>; }; template<> struct generator<10> { //done false case template<typename Control, int I, template<typename...> class F, typename T0, typename...Ts> using f = typename Control::template done<Ts...>; };

Since we do not want to hit max recursion depth we need two nested recursive loops, an outer one to bite off chunks of the input list and an inner one to actually do the removal. The chunk size is not actually trivial, the max is obviously smaller than 256 because we do not want the recursive inner loop to exceed default recursion depth. In total we will be doing N recursive steps no matter what the chunk size is, however the size of the list our alias will be working on will differ. Smaller lists will go faster, however the smaller the list the more joins we will have to do in order to put it all back together and the more recursions our outer loop will need to do.

Writing the control structure for filter is pretty painless:

template<bool Invert> struct filter_rotate_control { template<typename...Ts> using done = list<Ts...>; static constexpr int select(const int i, bool b) { return (Invert?!b:b) + (i == 0 ? 10 : 12); } }; template<template<typename...> class F, bool Invert> struct filter_control { template<typename...Ts> using done = list<>; template<typename L, typename T0, typename...Ts> using step = pair_<typename generator<filter_rotate_control<Invert>::select(sizeof...(Ts), F<T0>::value)>::template f<filter_rotate_control<Invert>, sizeof...(Ts), F, T0, Ts...>,L>; static constexpr int select(const int i) { return ciel_log2(i) > 7 ? 7 : ciel_log2(i); } };

calling it is also the samke kind of pattern as we have seen priviously:

template<typename L, template<typename...> class F> using filter = typename linked_join<typename wrap_generator<L, size<L>::value, filter_control<F, false>>::type>::type; template<typename L, template<typename...> class F> using remove_if = typename linked_join<typename wrap_generator<L, size<L>::value, filter_control<F, true>>::type>::type; template<typename L, template<typename...> class F> using partition = pair_<filter<L, F>, remove_if<L, F>>;

Although the meta monad generator looks huge at first with all of its many fast tracks, with a handful of control structures, aliases and helper functions like join it can replace a whole metaprogramming library and also provide order of magnitude faster compile times on many algorithms.

For my tests I used a chunk size of 64 and an 8x fast tracked recursive join. This already yielded about a 10x performance increase when benchmarking filter against metal::copy_if. This is however not entirely optimized as one could also add a “tree style” meta monad which bites off the 256 elements but calls 16x recursive remove_if and then does a fast tracked 16x fixed size join. In my gut feeling this should be a good bit faster.

The “tree style” meta monad will also come in handy in the future to implement sort which in my mind should be a hybrid of merge sort and partition where the algorithm

- bites off 64 elements,
- sorts them using merge sort,
- merges them with the 64 elements of state from the previous recursion
- divides that list in two using the middle as a pivot element for a partition of the rest.
- recurs on each half of the rest in the same way using the 64 element list half as state

This should beat current brigand which is already super optimized.

The full code can be found here.