Aksenov239's blog

By Aksenov239, history, 4 years ago, In English

Hello everybody!

After a long break I would like to announce the fourth Bioinformatics contest organized by ITMO University, Bioinformatics Institute, Rosalind and Stepik. The contest is sponsored by JetBrains, Yandex, Serokell, and Genotek.

The contest is hosted on Stepik platform. To participate you have to sign the rules.

The contest consists of two rounds:

The contest is prepared by the following team: Alexey Sergushichev, Nikita Alexeev, budalnik, demon1999, tourist, cdkrot, GShark, doreshnikov (all ITMO University), German Demidov (Center for Genomic Regulation), Andrey Prjibelski (CAB SPbU).

Are there any prizes?

  • 1st place – Whole Genome Sequencing
  • 2nd and 3rd – Whole Exome Sequencing
  • 4th and 5th – 23andMe or Genotek DNA Service
  • 1st–30th – Honorable Bioinformatics T-Shirt

Do I need to know biology?

The knowledge of biology is not necessary to solve the problems!

We wish that you will like the problems!

Good luck!

  • Vote: I like it
  • +254
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

What is the requirement for being qualified to the final round? It's not stated how maby participants advance, and only 'score the right number of points' is mentioned on the website :(

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    The only requirement is 'scoring the right number of points', it is stated clearly :)

    Everybody who will get at least 952 points (including one point from the "Welcome!" step with the Honor Code) will advance to the Final Round.

»
4 years ago, # |
  Vote: I like it +1 Vote: I do not like it

Daniil Oreshnikov is doreshnikov, you could have made him a colorful link too. Now it looks like he isn't registered on Codeforces.

»
4 years ago, # |
Rev. 2   Vote: I like it +157 Vote: I do not like it

Is it ethical to sequence the genome of one of the greatest programmers of our time? What if a bad agent gets access to it, creating a million clones, thereby causing rating deflation?

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it -101 Vote: I do not like it

    Ok why the downvotes?

    What Monogon said is COMPLETELY dumb. Your DNA is "legally" flying around, everything you touch, your hair, saliva, typically anything. If a bad agent needs your DNA, they already have it.

    So why the downvotes? cz unrated? Elitist/racist ruskis again?

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +69 Vote: I do not like it

      Because his comment was a joke and you are treating as it was a serious inference, genius.

»
4 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Welcome back! You were missed, even if my CPU is dreading what cursed implementations I come up with.

»
4 years ago, # |
  Vote: I like it +5 Vote: I do not like it

Hope Benq(or someone else) can share us something about his genome after the contest:-)

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    By the way, does "Whole Genome Sequencing" really mean the service?

»
4 years ago, # |
  Vote: I like it +1 Vote: I do not like it

One of it's kind!

»
4 years ago, # |
  Vote: I like it +9 Vote: I do not like it

The link for time and date says "19 June", but leads to "19 July"

»
4 years ago, # |
  Vote: I like it -7 Vote: I do not like it

Contest prepared by tourist

»
4 years ago, # |
Rev. 2   Vote: I like it -42 Vote: I do not like it

is this contest rated? if so, how do you know which user is which on the other platform? edit 1: so, let me get things clear, in this website people downvote posts where there's a genuine question and mods do nothing to prevent this?

»
4 years ago, # |
  Vote: I like it +8 Vote: I do not like it

t is a substring of s if t is contained as a contiguous collection of symbols in contiguous collection of symbols in s

Is this a non-standard definition of a substring? It's very confusing.

»
4 years ago, # |
Rev. 3   Vote: I like it +36 Vote: I do not like it

Could anyone tell me how to solve Diagnosis without running 2.5 hours for all the tests?

I only have solultions with time complexity $$$O(m(n+q))$$$ or $$$O((\sum cq_i)(\sum cm_i))$$$ or $$$O(m(\sum cq_i)+q(\sum cm_i))$$$.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +29 Vote: I do not like it

    I solved it by realizing that several test-cases are weak and having a certain pattern e.g.:

    • There is one input file with $$$|Q_p|=1$$$ and $$$|D_m|=1$$$. I solved it using a linear $$$O(n+m+q)$$$ solution to associate every vertex with a disease.

    • On another input file, the vertices for each patient are a subset of a certain disease, i.e. $$$Q_p \subseteq D_m$$$. Hence for each patient, I only need to find a disease that is a superset of patient vertices. With a decent filtering trick, my code ran for less than 1 minute for this test-case.

    • On another input file, I need to draw a portion of the tree to realize that the vertices that belong to the same disease or patient is not randomly placed within the tree, e.g. there is a certain pattern on it. The solution for this test-case becomes pretty trivial.

    The fact that the constraint for each test-case isn't explicitly stated made me think that the clue is hidden within the test-case itself.

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How did you solve tests 6 and 7? Are those the last two cases you described?

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +11 Vote: I do not like it

        test #7 is $$$Q_p \subseteq D_m$$$ the vertices for each patient are a subset of a certain disease

        test #6 is the one with the funny pattern (I need to draw a portion of the tree on a paper).

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
            Vote: I like it +11 Vote: I do not like it

          Another way to solve test 6 is to notice that depths are $$$\leq 5$$$(let's call this number $$$d$$$) and the sizes are $$$\leq 3$$$(let's call this number $$$s$$$). Now, for each (disease, patient) pair the solution is the maximum sum of nodes such that the $$$i$$$-th node is a parent of some disease's node and the $$$i$$$-th patient's node. For each disease, the number of possible parents is $$$s \cdot d$$$, so we can just precompute for each disease all possible ways its parents can contribute to the solution; there are up to $$$(s \cdot d)^s$$$ ways. Then, for each patient, try all $$$d^s$$$ ways to choose the parents and see if there is a disease which can also have that way of choosing parents.

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Thanks!

      Maybe I am not good at finding the patterns of tests...

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        I think that (is this task and competetion) finding patterns + thinking about + impelemnting multiple solutions per test takes not less time than running $$$O(m (n + q))$$$. I'd better spent this time on actualizing template from old GCJ to run output only tasks in multiple threads (but I had spent it on optimizing running time from 2h+ to 1h...)

»
3 years ago, # |
Rev. 3   Vote: I like it +13 Vote: I do not like it

Problem 2 was B(inary)S(earch)

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Thanks a lot for challenge, I really enjoyed it (especially binary search on Stepik on B).

1). What was the optimal strategy for problem A?

2). OK, I received some points on problem D. But I don't understand, how was counting the rate of minor haplotype? Was some read counted if it differed with minor haplotype by less than k polymorphisms?

»
3 years ago, # |
  Vote: I like it +19 Vote: I do not like it

When you got 299.97 of 300 points trying to reconstruct the minor haplotype in Problem 4: