rhymehatch's blog

By rhymehatch, 4 months ago, In English

This is just a solution for this problem and a code share for this comment. Binary lifting is much simpler to implement but at that time I was still exploring the DFS and its similar variants.

Here, 2-pass dfs strongly connected components algorithm is done to allow processing the queries in particular order to allow for $$$O(n + m)$$$ where $$$n$$$ is the number of nodes and $$$m$$$ the number of queries, this is a bit better than $$$O((n + m)\log k )$$$ but recursive implementation is computationally expensive (Python solution in 3.12 will pass due to efficient cframes usage but currently CSES uses older versions).

#include <bits/stdc++.h>
using namespace std;

#define N 200001

vector<bool> V;
vector<int> G[N], T[N], L;
vector<vector<int>> S;
vector<pair<int, int>> Q[N];
int A[N];

void t(int x, vector<int> g[], vector<int> &r) {
	V[x]= 0;
	for(auto u: g[x]) {
		if(V[u]) t(u, g, r);
	}
	r.push_back(x);
}

void solve(int I, vector<pair<int, int>> &cs) {
	V[I]= 0;
	vector<int> s= S[I];
	if(s.size() > 1 || G[s[0]][0] == s[0]) {
		for(int i= 0; i < s.size(); i++)
			for(auto [q, k]: Q[s[i]]) { A[q]= s[(i + k) % s.size()]; }
	} else {
		int x= s[0];
		for(auto [q, k]: Q[x]) {
			if(k == 0) A[q]= x;
			else {
				int p= min(k, (int)cs.size());
				auto [c, i]= cs[cs.size() - p];
				k-= p;
				A[q]= S[c][(i + k) % S[c].size()];
			}
		}
	}
	for(int j= 0; j < s.size(); j++) {
		int x= s[j];
		cs.emplace_back(I, j);
		for(auto u: T[x]) {
			if(V[L[u]]) solve(L[u], cs);
		}
		cs.pop_back();
	}
}

int main() {
	ios::sync_with_stdio(0), cin.tie(0);
	int n, q;
	cin >> n >> q;
	for(int i= 0; i < n; ++i) {
		int b;
		cin >> b;
		G[i].push_back(b - 1);
		T[b - 1].push_back(i);
	}
	vector<int> O;
	V.assign(n, 1);
	for(int i= 0; i < n; i++) {
		if(V[i]) t(i, G, O);
	}
	reverse(O.begin(), O.end());
	V.assign(n, 1);
	L.assign(n, 0);
	for(auto i: O) {
		if(V[i]) {
			t(i, T, S.emplace_back());
			for(auto x: S.back()) { L[x]= S.size() - 1; }
		}
	}
	for(int i= 0; i < q; i++) {
		int x, k;
		cin >> x >> k;
		Q[x - 1].emplace_back(i, k);
	}
	V.assign(n, 1);
	for(int i= S.size() - 1; i >= 0; i--) {
		if(V[i]) {
			vector<pair<int, int>> cs;
			solve(i, cs);
		}
	}
	for(int i= 0; i < q; i++) { cout << A[i] + 1 << '\n'; }
}

Full text and comments »

  • Vote: I like it
  • -1
  • Vote: I do not like it

By rhymehatch, history, 8 months ago, In English

Editorial for this problem and solutions in the comments rely on a lot of symbolic expansion of various expressions until they discover regularity in formulas.

I'm used to looking at random walks as recurrences and directly solving them.

Tree below formulates the system of equations below where $$$p_i$$$ is the expectation of counts:

$$$ \begin{align*} p_1 &= \frac{p_3}{2} \\ p_2 &= 1 + \frac{p_3}{2} + p_8 \\ p_3 &= p_1 + \frac{p_2}{3} \\ p_4 &= p_5 + p_7 \\ p_5 &= \frac{p_4}{3} \\ p_6 &= 1 \\ p_7 &= \frac{p_4}{3} \\ p_8 &= \frac{p_2}{3} \end{align*} $$$

Solving general system of equations with $$$n$$$ variables is slow. I almost quit trying but this is a tree, and the rabbit out of the hat is that this sparse system can be solved by tree traversal. We start by pushing the coefficients at leaves to the parents and eventually get to the starting node (in tree above it is 2), and we can calculate the coefficient. Then a single BFS can push this information back to all nodes and calculate their expectation.

This is a general method to solve any system of equations defined by a tree.

Division is handled through pow(x, -1, M).

n, s, t, *e = map(int, open(0).read().split())
G = [[] for _ in -~n*' ']
for a, b in zip(*[iter(e)]*2):
	G[a] += b,
	G[b] += a,

M = 998244353

V = *C, = -~n*[0]

i = lambda x: pow(x, -1, M)  # handling division as multiplication
Q = [(s, 0)]  # only way to do stackless dfs in Python
while Q:
	x, p = Q[-1]
	if V[x]<1:
		V[x] = 1
		for u in G[x]:
			if u!=t and u!=p: Q += (u, x),
	else:
		c = 0
		for u in G[x]:
			if u!=p and u!=t:
				c = (c+C[u]*i(len(G[u])))%M
				# p_i = 1/degree(child) * p_child
		C[x] = i((M+1-c)*[len(G[p]), 1][p==0])  # (1 - c) * p_i = 1/degree(parent) * p_{parent}
		# final coefficient C[x] is 1/((1-c)*degree(parent)) so when C[p] is materialized, multiply
		Q.pop()

# C[s] is the only coefficient that is currently calculated
B = [(s, 0)]  # short BFS
for x, p in B:
	for u in G[x]:
		if u!=t and u!=p:
			C[u] = C[u]*C[x]%M
			B += (u, x),
C[t] = 1
print(*C[1:])

P.S. Python 3.11 comes with 100000x efficient stack (cframes improvement). It will be possible to write a much shorter implementation with recursive DFS.

Full text and comments »

  • Vote: I like it
  • +29
  • Vote: I do not like it

By rhymehatch, history, 10 months ago, In English

I've been going through the CSES Problemset with the goal to write shortest (but not obfuscated) Python solutions.

Due to smaller competition pool, I managed to accomplish this on a variety of tasks. But cutting characters sometimes introduces undesirable slowdowns, mostly in IO.

The problem inputs with $$$2\cdot 10^5$$$ to $$$10^6$$$ integers are big enough to require third or half of the execution time just to read/print and very often result in TLE. In that case it is important to be sure you are reading as quickly as possible. In C++ writing the same algorithms works well even with inefficient IO.

I noticed io.Bytes does not help at all and adds too many characters. Printing gets as fast as possible (<1% of exec time) with

print(' '.join(map(str, S)))

where S is a list of answers. Having multiple calls to print or print(*S) will be too slow.

Similarly, calling input or sys.stdin.* repetitively also isn't optimal. What I usually do is:

n, m, q, *e = map(int, open(0).read().split())

# example of handling unpacking
E = [(a, b) for a, b in zip(*[iter(e)]*2)]

This is fast, but the snippet below is much faster:

# fast variant
e = [*map(int, open(0).read().split()]
n, m, q = e[:3]
E = [(a, b) for a, b in zip(*[iter(e[3:])]*2)]

The expansion *e, = map(int, open(0).read().split()) takes ~30% more to execute (yet it saves a single character).

A good recent example is on Distance Queries where doing the fast variant gives a lot of room to not TLE: Optimized IO

Or Counting Paths where I couldn't get it to pass without the fast variant. Adding code optimizations to already short code in most cases does not help. Short code, if not heavily obfuscated, means that most of the useful invariants in the solution are fully exploited. Examples of where you might think you're slow when trying to write short programs:

  • doing binary lifting 18 times every query vs doing it only depth[a].bit_length() times will improve exec time but compared to IO it is insignificant. To support depth[a].bit_length() you have to extend the DFS to maintain this data and gains in exec time soon disappear,
  • list[(int, int)] as your DFS queue is slower than list[int], using two Q.pop() operations to get all of the metadata is faster, takes more characters but still nothing compared to IO,
  • Q.append(x) is faster but not shorter than Q += x, yet the speedup is negligible in the grand scheme of things,
  • A = L.copy() vs *A, = L

Of course, there might be a case where choosing the optimal language constructs is necessary but I haven't stumbled upon such case yet.

Optimized IO to make the problem not TLE

Full text and comments »

  • Vote: I like it
  • +15
  • Vote: I do not like it

By rhymehatch, history, 2 years ago, In English

https://codeforces.net/problemset/problem/1358/C

I cannot think simply enough.

First I amused myself with the number of possible paths:

$$$\frac{(x_2-x_1+y_2-y_1)!}{(x_2-x_1)!(y_2-y_1)!}$$$

I realized it is useless due to the inputs going to $$$10^9$$$.

Then I focused on figuring out how to calculate value for given coordinate $$$(i,j)$$$.

It took me a while to figure that out:

$$$ \bigg(\sum\limits_{k=1}^{i+j-1} k \bigg) - j + 1 = \frac{(i+j-1)(i+j)}{2}-j+1 $$$

Then I did not know what to do, so I started calculating all of the possible sums. Then I realized that there is a minimal sum and a maximal sum. Minimal is obtained by going through row $$$x_1$$$ for all $$$y_1$$$ to $$$y_2$$$. Then down the column $$$y_2$$$ for all $$$x_1+1$$$ to $$$x_2$$$. Maximal by first going down by column $$$y_1$$$ and then going right through row $$$x_2$$$.

Given the huge number of possible paths ($$$\approx {10}^{1000000000}$$$) I assumed that the difference between maximal and minimal sum will contain all numbers. So it is max-min+1 .

Then I just quit trying to work out the formula with a pencil and wrote the whole sum calculation in WolframAlpha:

Simplify[
  Sum[Sum[i,{i, 1, x2+j-1}] -j +1,  {j,y1,y2}] 
+ Sum[Sum[i,{i, 1, j+y1-1}] -y1+1, {j,x1,x2-1}]
- Sum[Sum[i,{i, 1, x1+j-1}] -j +1,  {j,y1,y2}] 
- Sum[Sum[i,{i, 1, j+y2-1}] -y2+1, {j,x1+1,x2}]
]

The output was $$$(y_1-y_2)(x_1-x_2)$$$. I added 1 and tada.

Insane waste of time. I still have no idea how to think simply enough to not go through all of these steps.

Full text and comments »

  • Vote: I like it
  • +4
  • Vote: I do not like it