Even more efficient but not so easy segment trees

→ Pay attention

Before contest
Codeforces Round 1006 (Div. 3)
4 days

→ Streams

The 2025 Universal Cup Finals

By tourist

Before stream 08:18:50

Greedy Algorithms — Topic Stream

By Shayan

Before stream 18:48:50

View all →

→ Top rated

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	Qingyu	153
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

View all →

→ Find user

→ Recent actions

Detailed →

sslotin's blog

Even more efficient but not so easy segment trees

By sslotin, 3 years ago, In English

https://en.algorithmica.org/hpc/data-structures/segment-trees/

I wrote a SIMD-friendly segment tree ("wide segment tree") that is up to 10x faster than the Fenwick tree and the bottom-up segment tree:

The article also explains in detail how all the other popular segment tree implementations work and how to optimize them (fun fact: the Fenwick tree can be made 3x faster on large arrays if you insert "holes" in the array layout and make it ~0.1% larger). The article is long but hopefully accessible to beginners.

I only focused on the prefix sum case (partially to make the comparison with the Fenwick tree fair). While ~~I have some ideas on generalizing it to more complex operations, and I will probably add~~ I've added a separate section on implementing other operations, I highly encourage the community to try to efficiently implement mass assignment, RMQ, lazy propagation, persistency, and other common segment tree operations using wide segment trees.

+260

sslotin
3 years ago
7

Comments (7)

Write comment?

Kyou_mo_kawaii

3 years ago, # |

In terms of generalizations, atcoder's interface is very flexible and popular: https://atcoder.github.io/ac-library/master/document_en/segtree.html It would be great if you can make your wide segment tree a drop in replacement targeting it.

In particular, it doesn't assume the operation is commutative, so it must accumulate front and back in the right order: https://github.com/atcoder/ac-library/blob/master/atcoder/segtree.hpp#L45 I think this ruins some of the optimizations you made but there should still be benefits to using the higher branching factor?

→ Reply

sslotin

3 years ago, # ^ |

I don't think you can make an entirely drop-in replacement of the atcoder's segment tree as it requires users to write their own (scalar) reducers, while most wide segment tree operations need to be implemented with SIMD intrinsics.

That said, maybe it is possible to make a partial specialization on std::plus, std::bit_xor, std::multiplies, std::min, and other common binary operations, but the library users would have to actually pass them instead of writing their own code.

P. S. Added a section with ideas on how to implement RMQ, general reductions, and lazy propagation.

→ Reply

nor

3 years ago, # ^ |

I agree. However, I just wanted to point out that such an approach might be simplified by doing what eve does (not the platform-agnostic parts, but the abstraction parts). For instance, templating based on block sizes (different block sizes would be needed for different int types, for instance) would be nice.

→ Reply