twoslow's blog

By twoslow, history, 6 years ago, In English

Hello, codeforces community

  • I've a 2D sparse matrix of 0's and 1's of size NxN, where N = 16 Million i.e 16,000,000.
  • The number of non zeros(NNZ) in the matrix are ~450 Million. i.e 450,000,000.

What is the most efficient way to store the such matrix, given that I'll get the queries like:

  • What are the non-zero elements in row number x.
  • What are the non-zero elements in column number y.

Currently I'm storing the matrix in CSR format. Which takes an integer array of size NNZ and another array of size N.

Which is taking

  • (450000000 * 4 Bytes) / (1024 * 1024) ~ 1716MB of memory for row storage, and same amount of memory for column storage.

The overall memory to store such structure is about 3.5GB.

I want to compress it in a way such that the memory consumption is reduced and the queries can still be fast. I am looking for some kind of serialization technique or something which can reduce the memory consumption or time complexity.

Any help regarding this is most welcome.

  • Vote: I like it
  • +3
  • Vote: I do not like it

| Write comment?
»
6 years ago, # |
  Vote: I like it +1 Vote: I do not like it

You could get away with 3 bytes instead of 4 bytes because the column indices and row indices  < 16000000 < 16777216 = 224.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This would have been great optimisation, but unfortunately the number of rows can increase, It increase at rate 0.1 Million / month. So having 3 bytes would not be good solution after ~10 months.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +1 Vote: I do not like it

      There are some ways to get around that. You could for example split your matrix into chunks being able to be indexed with 3 bytes, or maybe even 2 bytes.