Python is a great programming language: simple, expressive, compact.
In Codeforces, Python is often the best choice for Div 2 A and B problems. For example, problem 600A - Extract Numbers is very easy to write in Python: first tokenize the string with the built-in split()
function, then try to parse the integers with the built int()
, then output the comma-separated strings of results with ",".join(lst)
.
For the more complex problems, writing fast-enough Python code is often a challenge. Here is a list of tips to improve performance.
Use PyPy instead of the standard Python intepreter. According to 20 offical benchmarks it is on average 7 times faster than Python 2. PyPy2 is in my opionion the best option at the moment for competitive programming.
Append to an existing string with "+=" instead of concatening more than two strings with "+" and storing the result with "=". (With two strings, both ways work fine.
s1 = s1 + s2
is fast because Python interpreter optimizes that tos1 += s2
.)string.join(iterable)
is as fast as it gets.list comprehension is faster than
for
loops.prefer
xrange()
instead ofrange()
, as the former returns an iterator, while the latter a new list.avoid
zip()
in Python 2 (constructs) a list, prefer afor
loop instead.zip()
is fine in Python 3 because there it returns an iterator.it's ok to use Python list as an array; it has O(1) element access time. It is NOT ok to use it as queue, even if the list is sorted. Use
collections.deque
instead if you need fast extraction of the max element.
Experimental results
For the following tests I used a Lenovo laptop with Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz. Each test was run 10 times, and the average results are reported.
I/O: Python vs. printf/scanf
vs. cin/cout
The problem: read a file with integers from 0 to 500000, store them in an array, then print out the array.
Implementations: scanf/printf, cin/cout, "fast" cin/cout, Python 2, Python 3.
The results:
scanf/printf: 0.1056 sec
cin/cout: 0.2274 sec
"fast" cin/cout: 0.0943 sec
Python 2: 0.3535 sec
PyPy 2: 0.1645 sec
Python 3: 0.2535 sec
PyPy 3: 0.3324 sec
Verdict: C++ is faster, but PyPy2 should be fast enough for most problems.
Iterator construction vs. list construction
The problem: sum all integers from 0 to 10e6 by using a for
loop.
The results:
Loop with range
:
python2: 0.502402067184 sec
pypy2: 0.0170 sec
Loop with xrange
:
python2: 0.334251880646 sec
pypy: 0.0166962146759 sec
Loop with range
in Python3 that is in fact the same as range
in Python2:
python3: 0.6606624750002084 sec
pypy3: 0.0168 sec
Verdict: the differences between PyPy and standard Python are actually much much more important than range vs. xrange!
List construction: iterative growth vs. pre-reserving all memory at start.
The problem: Create a list with 1e6 elements.
The code when using iterative growth:
l = []
for x in range(1000000):
l.append(x)
The code when pre-reserving memory:
l = [0] * 1000000
for x in range(1000000):
l[x] = x
The results: With append:
pypy2: 0.483808994293
python2: 0.698101997375
pypy3: 0.510159969329
python3: 0.844806871000
With []:
pypy2: 0.0570020675659
python2: 0.618577957153
pypy3: 0.06054091453552246
python3: 0.6261008259998562
Verdict: PyPy is able to take advantage of the faster algorithm (with one-time growth), while standard Python is not able to. The PyPy speedup is an order-of-magnitude (~10 times).