Strange Time Difference

→ Обратите внимание

До соревнования
Codeforces Round 996 (Div. 2)
5 дней
Зарегистрироваться »

*есть доп. регистрация

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	jiangly	3977
2	tourist	3815
3	jqdai0815	3682
4	ksun48	3614
5	orzdevinwang	3526
6	ecnerwala	3514
7	Benq	3483
8	hos.lyric	3381
9	gamegame	3374
10	heuristica	3358

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	170
2	-is-this-fft-	162
2	Um_nik	162
4	atcoder_official	160
5	djm03178	157
6	Dominater069	156
7	adamant	154
8	luogu_official	152
8	awoo	152
10	TheScrasse	147

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя amirhoseinfar1385

Strange Time Difference

Автор amirhoseinfar1385, история, 3 дня назад, По-английски

Hello everyone! Today, I encountered a very strange problem with question e1. I have one submission which is this: [link] And another submission which is this: [link] The only difference between the two is in line 123 and it's a very small difference, but the first one is accepted in 1.8 seconds, and the second one in 6 seconds. There is a huge difference. I wanted to know if anyone knows the reason for this big difference; it doesn't make any sense at all. That line executes about 100,000,000 times. I know it's a lot, but it doesn't make sense to me that this change would add 4 seconds to my code. I want to know if there could be another reason or if that's really all there is to it.

amirhoseinfar1385
3 дня назад
5

Комментарии (5)

Написать комментарий?

amirhoseinfar1385

3 дня назад, # |

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).

→ Ответить

Xellos

3 дня назад, # |

This looks like cache misses. Accessing dp[u][v][j] for sequential j is cheap. Accessing dp[u][alle[j].u][j], dp[alle[j].v][v][j], dp[u][alle[j].v][j], dp[alle[j].u][v][j] is much more expensive since they're at various places in memory and those places can only be known at runtime, so no optimisation is possible.

→ Ответить

amirhoseinfar1385

3 дня назад, # ^ |

← Rev. 2 →

An interesting thing happened that I thought I'd share. I had a submission, this link: [link] Instead of taking the minimum, I used || and saw no improvement in time. But then I used:

#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2")

and it Reduced the time by about 1.5 seconds. It's strange because it definitely couldn't vectorize, as you mentioned, it is determined by the inputs and can't optimize this.

→ Ответить

Xellos

3 дня назад, # ^ |

You can check assembly with -fverbose-asm. O3 can sometimes make very aggressive, bad optimisations, and it often results in worse runtime from what I found out when I tried to use it all the time.

→ Ответить

amirhoseinfar1385

3 дня назад, # |

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 08.01.2025 05:51:46 (i2).

Десктопная версия, переключиться на мобильную.

При поддержке