pigpigger's blog

By pigpigger, history, 20 months ago, In English

Computing Jordan canonical form

Definition 1 root subspace for $$$\lambda$$$ is the subspace satisfying $$$\lambda$$$ $$$\exist k, (\mathcal{A}-\lambda\mathcal{I})^k\alpha=0$$$

Lemma 1 The sum of different root subspace is direst sum

proof

$$$f(x)$$$ is the characteristic polynomial's factor of $$$\lambda_1$$$ , $$$g(x)$$$ is the remaining part

Now $$$f(x)$$$ is annihilator for $$$W_{\lambda_1}$$$
because the whole characteristic polynomial is annihilator, but $$$g(x)$$$ cannot annihilate $$$W_{\lambda _1}$$$ (consider the diagonal of the matrices)

we have $$$(f(x),g(x))=1$$$ now we claim that $$$g(\mathcal{A})$$$ is invertible. There exist $$$u(x)g(x)=1 \bmod f(x)$$$

so $$$u(\mathcal{A})g(\mathcal{A})=\mathcal{I}$$$

to prove $$$W_{\lambda_1}\oplus W_{\lambda_2}\dots\oplus W_{\lambda_t}$$$ we only need to prove vector $$$0$$$ have only one representation.

say $$$0=u_1+u_2+\dots+u_t$$$ . applying $$$g(\mathcal{A})$$$ to both side yield $$$g(\mathcal{A})u_1=0$$$ . Then applying $$$u(\mathcal{A})$$$ gives the result. $$$\square$$$

Definition 2 cyclic subspace $$$C(\alpha)$$$ is the subspace generated by $$$\alpha \ ,\mathcal{A}\alpha,\ \mathcal{A}^2\alpha \dots$$$ and denote the last none zero term of the sequence $$$last(\alpha)$$$

(it's easy to prove $$$\alpha \ ,\mathcal{A}\alpha,\ \mathcal{A}^2\alpha \dots$$$ are independent.

Lemma 2 $$$V$$$ is the direct sum of some cyclic subspace for nilpotent $$$\mathcal{A}$$$

proof

we give a constructive and algorithmic proof.

Let the degree of minimal polynomial be $$$d$$$.

First start with solving $$$AX=0$$$ ,then $$$A^2X=0$$$ ..... $$$A^dX=0$$$ The solution space is expanding. Denote $$$X_i$$$ the set of basis that is newly inserted after solving the i-th equation.

Denote the vectors in $$$X_d$$$ $$$\alpha_1,\alpha_2\dots\alpha_k$$$ , in $$$X_{d-1}$$$ $$$\beta_1,\beta_2\dots\beta_l$$$ in $$$X_{d-2}$$$ $$$\gamma_1,\gamma_2\dots\gamma_m$$$

First we claim that $$$C(\alpha_1)\oplus C(\alpha_2)\dots \oplus C(\alpha_k)$$$ .

$$$ \sum_{i,j} \mu_{ij}\mathcal{A}^j\alpha_i=0 $$$

applying $$$\mathcal{A}^{d-1}$$$ to both side yields

$$$ \sum_{i,j} \mu_{ij}\mathcal{A}^{j+d-1}\alpha_i=0 $$$
$$$ \sum_{i} \mu_{i0}\mathcal{A}^{d-1}\alpha_i=0 $$$

now

$$$ \sum_{i} \mu_{i0}\alpha_i $$$

is in $$$Ker(\mathcal{A}^{d-1})$$$ ,but $$$\alpha_i$$$ are newly inserted, if this is not $$$0$$$, some $$$\alpha$$$ can be represented by the basis of $$$Ker(\mathcal{A}^{d-1})$$$ ,a contradiction.

so $$$\forall i,\mu_{i0}=0$$$ because $$$\alpha_i$$$ are linearly independent.

$$$ \sum_{i\ge1,j} \mu_{ij}\mathcal{A}^j\alpha_i=0 $$$

now apply $$$\mathcal{A}^{d-2}$$$ and similarly $$$\forall i,\mu_{i1}=0$$$. the remaining proof is obvious.

At that time, we want to insert $$$C(\beta_i)$$$ but not all of them should be inserted.

Firstly I thought if $$$\beta$$$ is linearly independent with all $$$\mathcal{A}\alpha$$$ , $$$C(\beta)$$$ and $$$C(\alpha)$$$ is direct sum. but that was wrong.

consider the following example $$$A=$$$

0 0 1 2 3
0 0 0 1 -1
0 0 0 1 2
0 0 0 0 0
0 0 0 0 0

$$$X_1$$$ is

1 0
0 1
0 0 
0 0
0 0

$$$X_2$$$ is

0 0
0 0
1 0
0 -2
0 1

$$$X_3$$$ is

0
0
0
0
1

now $$$\alpha$$$ $$$\mathcal{A} \alpha$$$ $$$\mathcal{A}^2 \alpha$$$ are

0 3  2
0 -1 0
0 2  0
0 0  0
1 0  0

now choose in $$$X_2$$$ $$$\beta =$$$

0
0
1
0
0

which is linearly independent with $$$\mathcal{A}\alpha$$$ but $$$\mathcal{A}\beta=$$$

1
0
0
0
0

is linearly independent with $$$\mathcal{A}^2 \alpha$$$

so it occurred to me that I should maintain a linearly independent kernel.

$$$ \sum_{i,j} \mu_{ij}\mathcal{A}^j\alpha_i+\sum_{i,j}\lambda_{ij}\mathcal{A}^j\beta_i=0 $$$

apply $$$\mathcal{A}^{d-1}$$$ we have.

$$$ \sum_{i\ge1,j} \mu_{ij}\mathcal{A}^j\alpha_i+\sum_{i,j}\lambda_{ij}\mathcal{A}^j\beta_i=0 $$$

now apply $$$\mathcal{A}^{d-2}$$$

$$$ \sum_{i\ge1} \mu_{i,1}\mathcal{A}^{d-1}\alpha_i+\sum_{i}\lambda_{i,0}\mathcal{A}^{d-2}\beta_i=0 $$$

because the $$$\beta$$$ we choose ($$$\beta$$$ here is not all the $$$\beta$$$ )to add here satisfy $$$last(\beta)$$$ is linearly independent with $$$last(\alpha)$$$

so $$$\forall i, \mu_{i,1}=\lambda_{i,0}=0$$$ continue the process, we can prove we get some direct sum of cyclic subspace.

but is the sum the complete space $$$V$$$ ?

Note that originally $$$X_1, X_2 \dots$$$ are a basis.

when we come to the k-th layer(corresponding to inserting $$$X_k$$$) , we can consider this algorithm as substituting some vectors in $$$X_k$$$ with the previously inserted vectors. For example, use $$$\mathcal{A} \alpha$$$ to change out some $$$\beta$$$ . Because originally $$$last(X_{d-1})$$$ is linearly independent with size $$$|X_{d-1}|=|last(X_{d-1})|$$$ , finally the vectors in the answer set form a set with size $$$|last(X_{d-1})|$$$ too.(the basis of a space is a fixed number).

So finally we will get a set with size $$$|X_{d-1}|$$$ in place of $$$X_{d-1}$$$. For $$$X_k$$$ it is similar. $$$\square$$$

Theorem There is a basis under which the matrix is in Jordan form.

proof

For every eigenvalue, run algorithm of lemma 2. And the union of vectors is a Jordan basis.

Furthermore, it is easy to see that the number of different size of Jordan square is determined by the solution of $$$(A-\lambda^kI)X=0$$$ so it is determined by $$$rank(A-\lambda I)^k$$$ . Which is invariant for similar relation.

So Jordan form is canonical form of similar matrices. $$$\square$$$

Corollary Jordan form and its transfer matrix can be computed in $$$O(n^4)$$$.

proof

Computing the solution for $$$A^kX=0$$$ need $$$Cn^4$$$ operations if implemented using ordinary method.

Computing the chain of a vector $$$\alpha$$$ (the space $$$C(\alpha)$$$) need $$$C\sum d_i\times n^2=Cn^3$$$. Maintaining the kernel need $$$C\sum d_i^3=O(n^3)$$$

So the complexity is $$$O(n^4)$$$ $$$\square$$$

This method is a clear and straightforward geometry method. It is easy to implement compared to algebraic method of $$$\lambda$$$-matrix. And the proof of lemma 2 is different from classical textbook because I want to prove the correctness of the algorithm.

There are better algorithms for computing Jordan form. For example, An $$$O(n^3)$$$ Algorithm for the Frobenius Normal Form Storjohann 1998 is an easy faster way.

I did not find a fast algorithm that can compute transfer matrix of Jordan form online.

but use idea of some reduction. It may be done with better complexity.

The code below needs eigenvalues.

#include<bits/stdc++.h>
#define de fprintf(stderr,"on-%d\n",__LINE__)
#define pb push_back
#define fr(i,a,b) for(int i(a),end##i(b);i<=end##i;i++)
#define fd(i,a,b) for(int i(a),end##i(b);i>=end##i;i--)
#define reduce Reduce
#define apply Apply
using namespace std;
const int N=30;
int m;
struct frac{
	int q,p;
	frac(){
		q=0;p=1;
	}
	void out(){
		if(p==1)printf("%7d ",q);
		else printf("%3d/%3d ",q,p);
	}
};
frac zero;
frac make(int x,int y){
	frac A;
	A.q=x;A.p=y;
	return  A;	
}
frac r[N];
int d[N];
frac V(int x){
	return make(x,1); 
}
frac read(){
	int x;scanf("%d",&x);
	return V(x);	
}
int gcd(int x,int y){
	if(!x)return y;
	return gcd(y%x,x);
}
frac reduce(frac &A){
	int g=gcd(A.p,A.q);	
	A.p/=g;A.q/=g;
	if(A.p<0)A.p=-A.p,A.q=-A.q;
	return A;
}
frac reduce(frac A){
	int g=gcd(A.p,A.q);
	A.p/=g;A.q/=g;
	if(A.p<0)A.p=-A.p,A.q=-A.q;
	return A;
}
frac operator +(const frac &A,const frac &B){
	return reduce(make(A.q*B.p+B.q*A.p,A.p*B.p));	
}
frac operator -(const frac &A,const frac &B){
	return reduce(make(A.q*B.p-B.q*A.p,A.p*B.p));	
}
frac operator *(const frac &A,const frac &B){
	assert(A.p);
	return reduce(make(A.q*B.q,A.p*B.p));
}
frac operator /(const frac &A,const frac &B){
	return reduce(make(A.q*B.p,A.p*B.q));
}
struct matrix{
	frac A[N][N];
	frac* operator [](int x){return A[x];}
	void out(){
		fr(i,1,m){
			fr(j,1,m)A[i][j].out();
			printf("\n");
		}
	}
};
matrix I; 
void init(){
	fr(i,1,m)fr(j,1,m)I[i][j]=V(i==j);
}
matrix operator *(matrix A,matrix B){
	matrix C;
	fr(i,1,m)fr(j,1,m){
		C[i][j]=V(0);
		fr(k,1,m)C[i][j]=C[i][j]+A[i][k]*B[k][j];
	}
	return C;
}
matrix operator *(frac v,matrix A){
	fr(i,1,m)fr(j,1,m)A[i][j]=v*A[i][j];
	return A;
}
matrix operator +(matrix A,matrix B){
	fr(i,1,m)fr(j,1,m)A[i][j]=A[i][j]+B[i][j];
	return A;
}
matrix operator -(matrix A,matrix B){
	fr(i,1,m)fr(j,1,m)A[i][j]=A[i][j]-B[i][j];
	return A;
}
vector<frac> apply(matrix &A,const vector<frac>& B){
	vector<frac> C;C.resize(m+1);
	fr(i,1,m)fr(k,1,m){
		C[i]=C[i]+A[i][k]*B[k];	
	}
	return C;
}
bool p[N];
int to[N];
void rowreduce(matrix &A,int n,int m,int op=0){
	fr(i,1,m)to[i]=p[i]=0;
	fr(i,1,n){
		int s=0;
		fr(j,1,m)if(!p[j]&&A[i][j].q){
			s=j;break;	
		}
		if(!s)continue;
		to[i]=s;
		p[s]=1;
		fr(j,op*i+1,n)if(j!=i&&A[j][s].q){
			frac t=A[j][s]/A[i][s];	
			fr(k,1,m)A[j][k]=A[j][k]-t*A[i][k];		
		}
	}
}
void swap(frac* A,frac *B){
	fr(i,1,m)swap(A[i],B[i]);	
}
matrix inverse(matrix A){
	matrix B=I;
	fr(i,1,m){
		int s=0;
		fr(j,i,m)if(A[j][i].q){
			s=j;break;	
		}
		assert(s);
		swap(A[i],A[s]);swap(B[i],B[s]);
		const frac t=A[i][i];
		fr(j,1,m){
			A[i][j]=A[i][j]/t;
			B[i][j]=B[i][j]/t;
		}
		fr(j,1,m)if(j!=i&&A[j][i].q){
			const frac t=A[j][i];	
			fr(k,1,m){
				A[j][k]=A[j][k]-t*A[i][k];		
				B[j][k]=B[j][k]-t*B[i][k];		
			}
		}
	}
	return B;
}
vector<vector<frac> > solve(matrix A){
	vector<vector<frac> >W;
	rowreduce(A,m,m,0);
	fr(i,1,m)if(!p[i]){
		vector<frac> T;T.resize(m+1);
		T[i]=V(1);
		fr(j,1,m)if(to[j]&&A[j][i].q){
			T[to[j]]=zero-A[j][i]/A[j][to[j]];	
		}
		W.push_back(T);
	}
	return W;
}
bool noempty(frac* A){
	fr(i,1,m)if(A[i].q)return 1;
	return 0;
}
int main(){
#ifdef pig
	freopen("pig.in","r",stdin);
	freopen("pig.out","w",stdout);
#endif
	cin>>m;
	init();
	matrix A;
	fr(i,1,m)fr(j,1,m)A[i][j]=read();
	int s;cin>>s;
	fr(i,1,s){
		r[i]=read();
		scanf("%d",&d[i]);
	}
	matrix Q;int top=0;
	fr(i,1,s){
		matrix B=A-r[i]*I,C=B;
		vector<vector<vector<frac> > > V;
		vector<vector<frac> > P;
		fr(j,1,d[i]){
			vector<vector<frac> > W=solve(C);	
			matrix A;
			fr(k,0,P.size()-1){
				fr(l,1,m)A[k+1][l]=P[k][l];	
			}
			fr(k,0,W.size()-1){
				fr(l,1,m){
					A[k+P.size()+1][l]=W[k][l];
				}
			}
			rowreduce(A,P.size()+W.size(),m,1);
			vector<vector<frac> >T;
			fr(k,P.size()+1,W.size()+P.size())if(noempty(A[k])){
				vector<frac> TT;TT.resize(m+1);
				fr(l,1,m)TT[l]=A[k][l];
				T.pb(TT);	
			}
			V.pb(T);
			P=W;	
			C=C*B;
		}
		vector<vector<frac> >ans;
		matrix now;int tot=0;
		fd(j,V.size()-1,0){
			fr(k,0,V[j].size()-1){
				vector<frac> T; T.resize(m+1);
				fr(l,1,m){
					T[l]=V[j][k][l];
				}
				fr(l,0,j-1)T=apply(B,T);
				tot++;
				fr(l,1,m)now[tot][l]=T[l];	
			}

			rowreduce(now,tot,m,1);
			fr(k,tot-V[j].size()+1,tot)if(noempty(now[k])){
				vector<frac> T;T.resize(m+1);
				fr(l,1,m)T[l]=V[j][k-(tot-V[j].size()+1)][l];	
				fr(t,0,j){
					ans.pb(T);
					T=apply(B,T);
				}
			}
		}
		fr(j,0,ans.size()-1){
			top++;
			fr(k,1,m)Q[k][top]=ans[j][k];
		}
	}
	assert(top==m);
	Q.out();
	printf("\n");
	(inverse(Q)*A*Q).out();
	return 0;
}
  • Vote: I like it
  • +57
  • Vote: I do not like it

»
20 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by pigpigger (previous revision, new revision, compare).

»
20 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by pigpigger (previous revision, new revision, compare).

»
20 months ago, # |
  Vote: I like it +23 Vote: I do not like it

I don't (really) know linear algebra, and I don't know how useful this tutorial is for the Codeforces community. But the number of upvotes here is brutally low for a blog which seems to me quite high-effort.

  • »
    »
    20 months ago, # ^ |
      Vote: I like it +38 Vote: I do not like it

    usually these tutorials are followed by some motivation and some example problems. This post has none, seems like i am reading a maths textbook with some code.

    • »
      »
      »
      20 months ago, # ^ |
      Rev. 2   Vote: I like it +14 Vote: I do not like it

      I really doubt I will ever see a problem where you need to compute the Jordan canonical form, I have seen some problems where you needed to do some stuff with eigenvalues and eigenvectors but usually, you use linear algebra for solving the problem, not in the coding part. Imagine using iterative algorithms in CP.