Statement
Given N (N <= 10^5) cats and M (M <= 10^5) dogs with their tag number (<= 10^9). A cat and a dog can be matched if they have the different tag number. Compute the greatest numbers of pairs that can be matched and print out the links.
Input
3 3 (N, M)
1 1 2 (tag number of N cats)
2 1 1 (tag number of M dogs)
Output
2
1 1
3 2
Explain
The first cat(tag number: 1) matches with the first dog(tag number: 2)
The third cat(tag number: 2) matches with the second dog(tag number: 1)
At first glance, I thought it was a simple maximum matching problem but then I realize that time complexity to build such bigraph is O(N * M). Are there other approaches to do it?
Yes.
What are there? Thanks in advance.
Yes
UPD: I misread the problem statement. Ignore the "solution" below
Can be solved with inclusion exclusion:
different tag pairs = all tag pairs — same tag pairs
so
ans = (n*m) — (same tag pairs)
You can count the number of same tag pairs by keeping two frequency arrays. You don't even need some tree, as you can compress the values to O(n+m) unique values. So you only need array cnt[n+m]
say cats = {1, 2, 3} and dogs = {1}
Then, where does your solution takes us?
Cats: cnt[1] = cnt[2] = cnt[3] = 1
Dogs: cnt[1] = 1 cnt[2] = cnt[3] = 0
The only matching is (1,1) which you can calculate from multiplying cnt[1] of cats with cnt[1] of dogs.
ans = 3-1 = 2
The 2 different tag pairs are (2,1) and (3,1)
That's not OPs query. He want a matching. :) No, number of distinct pairs. In your solution, 1 is reused.
oops, take back what I said.
But now I'm interested in the solution for this!
Sees a negative mark in the solution, calls it inclusion exclusion
Yeah I meant complementary counting, but brain was small today. We all have our smallbrain days
Meanwhile Svlad_Cjelli bigbraining 24/7. Also a smooth player. (This man gives great love advice!!!)
I don't have a complete solution. But some idea's if someone wants to work with them. This graph is special. Think we grouped all the cats and dogs by their tag number. Then build a bipartite graph(on mind, not paper). Also, think N = M
Lets apply marriage theorem on that. A perfect matching exists if for any subset on left side, we have enough nodes on the right side. This is fulfilled for any subset containing more than one
group of cats
. Why? Because at most one group of dogs are left out by one group of cats. So, any two groups have edge to all the dogs and we assumed N = M.The perfect matching fails when we consider subset of one group. Maybe we can use the deficiency theorem on this? I don't have time right now. Will catch it later.
My preliminary analysis shows that, with deficiency theorem, maximum matching = min(M, N — max(cats_with_tag_i + dogs_with_tags_i — M)).
Is this problem on some online judge? I want to check my idea before writing some solution here.
I think that your reasoning is correct, since Hall's Marriage Theorem is a special case of König / Max-flow = Min-cut. See my comment for an explanation in terms of min-cut.
Big spoilers ahead:
Dogs with the same tag are equivalent. Same for cats with the same tag. So, imagine a flow network where we "merge" all those into a single node. From s to each "cat-set", put an edge with capacity equal to the set's size. From each "dog-set" to t, put an edge with capacity equal to the set size. Put an edge from each "cat-set" to each "dog-set" having a different tag, with infinite capacity [this is important for the problem's model].
If you add the "sets of size 0" so that every set has a matching oposite set, the graph structure (except for the capacities) is completely fixed: it is almost a "complete bipartite", except for the edges going from vertex i on one side to vertex i on the other side.
Now, max-flow on that network would solve the problem. But let's compute min-cut instead: the network has so many edges that there are very few cuts to consider, and by max-flow = min-cut, that will be the solution.
Since the edges in the middle have infinite capacity, the min cut will never cut them, so we only need to consider cutting the other edges. The simplest possibility is cutting a whole side, which gives M or N as answers. If we leave the edge associated to vertex i on one side, without cutting it, then we are forced to cut the edges of every vertex on the other side, except for the corresponding vertex i. Applying the same reasoning to that other vertex, we must cut every single "outer" edge, except for the ones corresponding to both vertices "i". So, that gives only max(N, M) additional cuts to consider, apart from the previous two. The capacity of each one is M + N - ai - bi , where ai and bi are the amount of cats and dogs with tag i.
So the final answer is min(M,N, min(over all i, M + N - ai - bi))
About reconstructing the solution: If I am not mistaken, one can prove using the formula above that the following greedy algorithm works (I think that if we look at one of its steps, it never "ruins" the formula)
while (a match is still possible)
pick a tag i such that the current ai + bi is maximum
add to the answer any match involving that tag (and of course, remove its endpoints from future consideration)
Very well done. How did you find the greedy approach? I was thinking some way which will keep the equation invariant after the greedy pick. But I had to go to sleep.
The M and N terms in the formula always decrease by 1 (when adding a match), so no problem.
The M + N - ai - bi is very suggestive, since it decreases by 2 if ai + bi does not change, or else decreases by one. And the one with largest ai + bi is the one that is minimum, so that is the one that we should try to "stop" from becoming too small.