Hi, fellow programmer :)
I want to describe ideas that I used during ICPC Challenge 2020: Marathon.
First, how to get more than 600k score?
For 600k score we can just implement what this article says. Lets start with each cluster having only one vertex. (1) Then do the following thing until converge: choose random vertex and move it to one of the adjacent vertices cluster if it makes result better. (2) Now compress each cluster to one vertex and make new graph with weighted edges, then do everything like in (1) but now you are making clusters of clusters. I think image in article above explains it quite good. So, do (1) once and (2) until converge. Basically, that's it.
Second, let's analyze recieved clusters
As we get relatively close to optimal result, we now can analyze the structure of our partition. Here is random example for A3 in form {size of clusters : number of clusters with such size}:
We mostly have 10-50 clusters with big size and 1000-3000 clusters with single vertex.
So we can improve result by making better partition of big clusters or choosing better singleton vertices.
Third, kind of genetic algorithm
Notice that in (1) and (2) we are using random, so every new run give us new partition. The main idea is the next: different partitions are better than other partitions in some local regions; So we can combine them to get better result.
Particularly, what i did:
- generate 50-200 partitions
- choose random 5-15 of them and make new partition according to the next principal: 2 vertices are in the same cluster only and only if they are in the same cluster in every of chosen 5-15 partitions. Take a look on image.
- run 5-10 times (2) on new partition (do 3. on the same new partition from 2. 8 times, parallelly on processor cores and than take the best one)
- replace the worst one of 5-15 partitions chosen on 2. with the best one of 8 partitions generated on 3.
- do 2.->3.->4. 500-5000 times (it takes 5-20 hours on my laptop depending on A1/A2/A3)
Also you can take 5-100 the best partitions, generate 195-100 new partitions and do 5. again. This way you will probably stuck at different local maximum.
One more small improvement
So far in (1) and (2) we were only moving vertices to adjacent clusters. We can get extra 100-200 points if we will also try to move vertices to new singleton cluster.
That's all. Hope my english is not too bad and you like this post <3
Thanks a lot for sharing!
It's always interesting to read marathon solutions that have proven to be the best after a long-time challenge. They are usually very close to the optimal and very diverse in terms of used techniques and ideas.
Thanks for sharing your cool insights! The genetic algorithm is very nice.
One question: do you run the Louvain algorithm with Modularity only (ignore Density for this step and only later create lots of singletons) or you somehow take Density into account at this step already?
I run Louvain algorithm with both Modularity and Regularization. Both parameters are similar in updating.
Can you also post how much the scores for each test case improved after each idea you used (Louvian, big vs. singleton clusters, genetic)?
Scores during the contest were a little bit different than I show now because of some small features that I don't remember now. But main trends remains.
I run next steps consecutively (10 times) for each given graph:
And here what I get
A1:
A2:
A3:
Sum = 220356.694 + 186346.688 + 199290.700 = 605994.082
Genetic algorithm (as tested during contest on 500+ iterations 50+ partitions) gives 220700+ for A1, 189680+ for A2 and 200480+ for A3
Also, I got about 609k-610k (depending on partitions) when I ran only 1 iteration of genetic algorithm