I read in this paper and know that Binary GCD Implementation is proven to be about 2 times faster than Normal GCD Implementation.
Binary Iterative GCD Implementation (wikipedia)
Normal Iterative GCD Implementation
I just wonder if there is an Efficient Binary Extended GCD Implementation and how fast can it be ?