mukel's blog

By mukel, 11 years ago, In English

UPD: Added Scala futures approach.

Google CodeJam 2014 is coming, and it will be nice if someone could share a way to run multiple tests concurrently, all programming languages are welcome. I've seen very handy codes in D, but I still use mainly C++, Scala has some nice ways to do it also. Running programs concurrently can be useful in CF also, eg. offline output generation. I'll try to populate this thread with some examples. Right now is up to you.

  • Vote: I like it
  • +26
  • Vote: I do not like it

»
11 years ago, # |
Rev. 3   Vote: I like it +59 Vote: I do not like it

C++11 makes this easy. If you have a function f(), you can run it asynchronously with async(f), and get the result from the object returned by std::async:

auto t = async(f);

int res = t.get();

Now suppose f(int a, int b) calculates a sum of numbers within an interval and takes two parameters. You can make two lambdas that will cover the first half and the second half, and pass them to std::async():

auto t1 = async([](){ return f(0, 50); });
auto t2 = async([](){ return f(51, 100); });

int sum = t1.get() + t2.get();

Full example: http://pastie.org/8954684

Note that std::async() decides by itself how many threads to start. If you want to force a new thread, use std::async(std::launch::async, f).

EDIT: It looks like you can write even shorter code:

auto t1 = async(f, 0, 50);
auto t2 = async(f, 51, 100);

int sum = t1.get() + t2.get();
  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I think in most common C++ implementation async is not pooled, so it's probably counterproductive to use it for functions that run only shortly or to use it to run more tasks in parallel than you have hardware threads.

    • »
      »
      »
      11 years ago, # ^ |
        Vote: I like it +1 Vote: I do not like it

      Yes, std::async() starts a new thread per each function if it decides to evaluate it asynchronously.

      I think std::async() should be able to limit the number of threads to the number of CPUs, because when running the function, it can also decide to run it later or use lazy evaluation in the same thread.

  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This is good. It doesn't require templates, which makes it applicable in a contest where you don't have access to pre-written code.

    As long as it supports C++11...

»
11 years ago, # |
  Vote: I like it +3 Vote: I do not like it

you can easily parallelize your C++ code by using OpenMP pragmas. Example:

int a1 = 0, a2 = 1;

#pragma omp parallel for reduction(+: a1) reduction(*: a2)
for (int i = 0; i < 10; ++i) {
	a1 += a[i];
	a2 *= a[i];
}

//g++ -fopenmp openmp.cpp
»
11 years ago, # |
Rev. 3   Vote: I like it +5 Vote: I do not like it

Scala:

object A extends App {

  def tokenizeLine = new java.util.StringTokenizer(readLine)
  def readInts(n: Int) = { val t = tokenizeLine; Array.fill(n)(t.nextToken.toInt) }
  
  type TInput = (Int, Int, Int)
  type TOutput = (Int, Long)
  
  val solve: PartialFunction[TInput, TOutput] = { case (testNo, n, m) => 

    var sum = 0L
    // do something stupid but heavy:
    for (i <- 0 until n) sum += m

    (testNo, sum)
  }

  val testCount = readInt
  val inputs = scala.collection.mutable.ArrayBuffer[TInput]()
  
  for (testNo <- 1 to testCount) {
    val Array(n, m) = readInts(2)
    inputs += ((testNo, n, m))
  }

  //val outputs = inputs.par.collect { solve } seq // multi-threaded
  val outputs = inputs.collect { solve } // single-threaded
    
  for ((testNo, sum) <- outputs) println(s"Case #SStestNo: SSsum")
}

P.S. Forum parser goes crazy about $ sign in strings so I had to replace it with SS. Also, I have removed debugging and profiling from my code to make it shorter. Full code is in my original message.

Sample input:

8
1000000000 8
100000000 7
10000000 6
100 5
1000000000 4
100000000 3
1000000000 2
1 1

Multi-threaded output:

(...edited...)
Finished after 1332 ms

Single-threaded output:

(...)
Finished after 2367 ms

.par is the only switch into a parallel execution so it is easy to run sequentially for debugging.

I see you are from EPFL (the birth place of Scala), so I hope you will come up with some improvements to this code:) But in general, I don't think it is worth bothering with multiple threads in Code Jam unless you have 8 or more cores.

  • »
    »
    11 years ago, # ^ |
    Rev. 3   Vote: I like it +3 Vote: I do not like it

    UPD: Sad story about when a multi-threaded solution is a good idea... In fact, I didn't qualified in the FB Hackercup once because my Scala program had a silly bug, which I fixed almost instantly; and then my program finished flawlesly 20 seconds after the deadline, too late. Since then I promise myself not to use Scala for contests until I becomes really comfortable with it, and if I use it this year for the GCJ or the HackerCup, my code will run in multiple threads for sure.

    The .par is really handy for executing things in parallel with a almost no changes on the code. Another approach is using futures, check the following example :


    import scala.concurrent.duration._ import scala.concurrent._ import ExecutionContext.Implicits.global object Main { val in = new java.util.Scanner(System.in) def nextInt = in.nextInt def time(block: => Unit) = { val ticks = System.currentTimeMillis block System.currentTimeMillis - ticks } def main(args: Array[String]) = { val t = time { val taskCount = nextInt val tasks = for (_ <- 1 to taskCount) yield { // read the input sequentially val begin, end = nextInt // yields a Future[Long], magic begins future { // slow computation var sum = 0L for (i <- begin to end) sum += i sum } } // transform a Seq[Future[Long]] in Future[Seq[Long]] val all = Future.sequence(tasks) // wait 10s before raising a timeout exception val result = Await.result(all, 10 seconds) // be happy println(result mkString " ") } println("Total time: " + t + " ms") } }

    Even if I have a quadcore laptop (i7 with 4 physical cores and 8 virtual threads), the above code doesn't scale by a factor of 4 as expected, but only 2. Maybe I should try with a tweaked ExecutionContext instead of the default one. Any hints???

    UPD: I tested the code in a different machine and is scaling fine.

    • »
      »
      »
      11 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      Well, maybe your loops are too short? There can be issues with hotspot compiler kicking in after some time, and besides you are including IO in your timing, which is single-threaded. IMO it should be more scalable. My code with .par scales by 1.77 on dual core, and Futures should have a similar benefit.

      As a side note, you should avoid Scanner and use StringTokenizer for input. As far as I remember, for some problems here on CF only StringTokenizer was able to read large inputs within TL.