I was practicing when I came across this problem. And I submitted a few codes which are almost the same, but their running time is quite different. Here's the thing:
The fastest one is here, which only runs 1590ms:
n, mod = map(int, input().split())
dp = [0] * (n * (n - 1) + 1)
diff = [0] * (n * (n - 1) + 1)
msk = n * (n - 1) // 2
v = 1
wing = 0
for i in range(n):
for j in range(msk - wing, msk + wing + 1):
if dp[j]:
diff[j] += dp[j]
diff[j + (n - i - 1) + 1] -= dp[j]
cur = 0
for j in range(msk - wing, msk + wing + n - i):
cur += diff[j]
cur %= mod
dp[j] = cur
diff[j] = 0
for j in range(msk - wing, msk + wing + n - i):
if dp[j]:
diff[j - (n - i - 1)] += dp[j]
diff[j + 1] -= dp[j]
cur = 0
for j in range(msk - wing - (n - i - 1), msk + wing + n - i):
cur += diff[j]
cur %= mod
dp[j] = cur
diff[j] = 0
for j in range(-(n - i - 1), 0):
dp[j + msk] += v * (n - i + j) % mod
dp[j + msk] %= mod
v = v * (n - i) % mod
wing += n - i - 1
print(sum(dp[msk+1:]) % mod)
Yet, another code which is basically the same as the former one needs 3743ms:
def main():
n, mod = map(int, input().split())
dp = [0] * (n * (n - 1) + 1)
diff = [0] * (n * (n - 1) + 1)
msk = n * (n - 1) // 2
v = 1
wing = 0
for i in range(n):
for j in range(msk - wing, msk + wing + 1):
if dp[j]:
diff[j] += dp[j]
diff[j + (n - i - 1) + 1] -= dp[j]
cur = 0
for j in range(msk - wing, msk + wing + n - i):
cur += diff[j]
cur %= mod
dp[j] = cur
diff[j] = 0
for j in range(msk - wing, msk + wing + n - i):
if dp[j]:
diff[j - (n - i - 1)] += dp[j]
diff[j + 1] -= dp[j]
cur = 0
for j in range(msk - wing - (n - i - 1), msk + wing + n - i):
cur += diff[j]
cur %= mod
dp[j] = cur
diff[j] = 0
for j in range(-(n - i - 1), 0):
dp[j + msk] += v * (n - i + j) % mod
dp[j + msk] %= mod
v = v * (n - i) % mod
wing += n - i - 1
print(sum(dp[msk+1:]) % mod)
return
t = 1
for _ in range(t):
main()
I'm quite puzzled as these two codes seems nothing too different. I really wonder how this could happen as it could cause some unexpected TLEs in other questions. It would really help a lot if you could offer some insights on it.
Thanks in advance!
When I try Python3 the speed seems reversed. There are articles explaining why for CPython codes within a function are faster than global, but pypy might be doing something different to produce the opposite results...