PyPy3 long to int casting performance disparities (when long is divided twice)

#	User	Rating
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	156
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Hi all

Firstly thanks for reading this. It's kinda hard to articulate what I'm doing and I hope that I've written it clearly.

I tried playing around with long (larger than 1<<31) and ints in PyPy3 to see how performance is affected. If I'm not mistaken, numbers larger than 1<<31, or numbers originating from such numbers (eg. divided from a large number) are stored as longs in the underlying C struct. Smaller numbers are stored as int32.

For a control experiment, I tried storing large numbers:

//Just storing Longs as control experiment. Source:

x=1<<59

y=[]
for _ in range(10**7):
    y.append(x)

Output:

===== Used: 873 ms, 113092 KB ______

I then proceeded to try converting the Long to int. Note that I did a division initially (x>>1). This was intentional, as it slowed down the code significantly. The code would run much faster (~300ms) without this step. Note how the memory used has reduced as (I believe) the numbers are now stored as int32.

//Long to int directly, with dividing long by 2. Source:

x=1<<59
x=x>>1

print(id(x))

y=[]
for _ in range(10**7):
    y.append(int(1.0*(x>>31)))

Output: 4611686018427387905

===== Used: 1669 ms, 76632 KB

Finally, I attempted the same, but I assigned x to temp, deleted x, and reassigned temp to x. The code ran much faster (~300ms). But I have no idea why. I printed out the ids of x before and after to verify that the object (and hence its properties) didn't change.

//Long to int, with dividing long by 2. But deleting the long and reassigning the long. Source:

x=1<<59
x=x>>1

print(id(x))
temp=x
del x
x=temp
print(id(x)) #id does not change. i.e. same object, so properties shouldn't change

y=[]
for _ in range(10**7):
    y.append(int(1.0*(x>>31)))

Output: 4611686018427387905 4611686018427387905

===== Used: 358 ms, 76796 KB

I have no idea what's happening here, but my suspicion is that PyPy3's JIT compiler's logic was affected by the deletion and reassignment of x, and took a different series of steps when running the code.

x=1<<59 x=x>>1 print(id(x)) temp=x del x x=temp print(id(x)) #id does not change. i.e. same object, so properties shouldn't change y=[] for _ in range(10**7): y.append(int(1.0*(x>>31))) ===== Used: 249 ms, 76796 KB

Comments (5)

Write comment?

YMSeah

4 years ago, # |

Auto comment: topic has been updated by YMSeah (previous revision, new revision, compare).

→ Reply

WitchOfTruth

← Rev. 2 →

I don't quite understand the idea but I ran the first two examples on my machine under pypy3 and they gave identical results (~400ms for a loop)

Probably would be better if you provided the exact script you used to do the measurements

LeoPro

4 years ago, # ^ |

I ran the codes here and got similar results so I think it was the case.

kaons

Same

Hi, thanks for trying out the codes. I ran the code under Codeforces' "Custom test" with PyPy 3.7 as the language. I'm not sure how consistent these run on the servers that Codeforces uses, but I just tried running them again, with the following results:

Code 1:

x=1<<59

y=[]
for _ in range(10**7):
    y.append(x)

=====
Used: 826 ms, 113068 KB

Code 2:

x=1<<59
x=x>>1

print(id(x))

y=[]
for _ in range(10**7):
    y.append(int(1.0*(x>>31)))

=====
Used: 1653 ms, 76728 KB

Code 3:

So between Codes 2 and 3, the only difference was me deleting and reassigning x, and this resulted in a big difference of computation time. That is, going by Codeforces' Custom test. I'm not sure why this is the case, but there could be some anomalies especially since your results on test 2 on your machine (400ms) differs quite a lot from my test on Codeforces' Custom test (~1600ms). I do not have PyPy3 on my local machine hence I didn't test it locally.

YMSeah's blog