I am a newbie . I was studying this particular algorithm and I have some doubts that have arisen. Would be glad if someone could clear them .
My code goes like :
int pollardRho(int n) {
if(n%2==0)
return 2 ;
srand (time(NULL)) ;
int x, y , g=1 , a;
x = rand() % n + 1 ;
y = x ;
a = rand() % n + 1 ;
while(g==1) {
x = ((x*x) + a)%n ;
y = ((y*y) + a)%n ;
y = ((y*y) + a)%n ;
g = gcd(abs(x - y), n) ;
}
return g ;
}
What I infer is that the use of (x*x) + a mod n
is to generate another pseudo random number . But why cant we write it as :
int pollardRho(int n) {
if(n%2==0)
return 2 ;
srand (time(NULL)) ;
int x, y , g=1 , a;
while(g==1) {
x = rand() % n + 1 ;
y = rand() % n + 1 ;
g = gcd(abs(x - y), n) ;
}
return g ;
}
where the rand()
itself generates a new number . Basically, I want to know what is the significance of the equation being used .
You do realise that in the second one you only generate one x and one y? The first one generates many pairs (x, y).
Yes, but then I can put statements involving random in the while loop .
because rand() generates integer in the interval [0..RAND_MAX], and RAND_MAX almost always equals 32767 (It depends on complator, and in popular compilators it equals 32767).
Actually, rand() is not recommended to use in C++ at all, because it generates small numbers, have no addtional parametres, and have bad distribution.
But C++11 library , which realises good random is much harder to use, so big amount of people still using rand() in CP codes.
Second code just calculates
gcd(k, n)
, where k is random integer (but not really uniform, because you spoil the distribution by subtracting numbers) until it is not 1. For n = pq, where p and q are primes, your chance of success on each iteration is . If p and q are close to 109, it is very small. Plus what vitux said.First code is more clever. It, basically, looks at sequence r modulo n, where r0 and a are chosen randomly and for 1 ≤ i. Over any prime modulo this sequence is cyclic, possibly not strictly. So, we can see that if n has a prime divisor , than we will after a few iterations (not more than p) find two elements in this sequence that are equal modulo p, so their difference is divisible by p, and so is n, so
gcd(n, abs(x - y))
is divisible by p, so ifx != y
thengcd(n, abs(x - y))
is non-trivial divisor of n (it is divisible by p, so it is more than 1, it is less than n, because 0 ≤ x < n and 0 ≤ y < n). Why we will find such x and y? On k-th iteration x = rk and y = r2k, so if k is bigger than start of periodic part of sequence modulo p and divides length of period of sequence modulo p, then x and y are equal modulo p.So, first algorithm finds non-trivial divisor of n in (amount of iterations multiplied by calculating gcd). But actually, it finds it faster because our sequence is almost random modulo p, so birthday paradox applies and total length of non-periodic part and first period is about , so complexity is closer to , kind of that.
Kaban-5 answered very comprehensively, I will just add that x2 + a is also a compressing function, because x2 maps p values to half of p values (mod p, for odd primes p). This fastens the algorithm, since working set size continuosly decreases.
WTF: why "pi divided by 2" is replaced with cur date in my comment?
PS: If you really want to try (but it's a bad idea!) to use rand() or random() instead of x2 + a, you have to seed random with current number, because the numbers must form a chain:
becomes