fmt.Scan family in Golang
Introduction
Reading input from stdin in Golang and in any other language is not a tricky thing to do, actually is quite easy. In Go you could achieve this by simple using one of these functions:
- fmt.Scan
- fmt.Scanf
- fmt.Scanln
These functions are the standard for reading from stdin, but for large inputs they don't perform too good, let's see why and what could use instead.
Prerequisites
In order to follow this post, we will need to had Golang and Python on your machine
The program
On a recent post, I wrote about the issue that I was having related to Time Limit Exceeded(TLE) on my submissions written in Go. Even in a ridiculous way, given that an implementation of the same algorithm in Python were accepted, while my Go implementations were not. Almost all of my submissions written in Go faced TLE, so I decided to dig a little further on the issue.
After profiling a program that receive an input in this way
1 4 1 2 3 4
Where the first number is the number of test cases, the second one is the number of elements of an array, and the next elements are those elements of the array.
I won't do anything fancy with this input, I will just read them and see how fast Go can do this using fmt scans functions. This is the code I'm using, let's see it: package main
import (
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
var tc int
fmt.Scan(&tc)
for i := 0; i < tc; i++ {
var n int
fmt.Scan(&n)
arr := make([]int, n)
for j := 0; j < n; j++ {
fmt.Scan(&arr[j])
}
fmt.Println("DONE TC: ", i+1)
}
}
The first part of the program is just using "runtime/pprof" to make the profiling of the program. Now if you run this program with a small input, you won't have any issue related to performance, the problem start when we try this program with an input with large amount of elements in the array. Let's write a simple script in Python to generate an input file:
with open("in_large", "w") as f:
f.write("1\n")
f.write("2000000\n")
for i in range(2000000):
f.write("1 ")
Profiling
This will generate an input with one test case and an array of $$$2*10^6$$$ elements. Now let's profile this program, here are the steps to follow:
- Compile the code
go build -o main
- Generate the input file,
python app.py
, where app.py contains the code above - Run the compiled program against the input file generated,
./main -cpuprofile=out.prof < input_large
- Open profile tool,
go tool pprof main out.prof
Now let's see if we could identify the bottleneck on the program. After running top10
, you will see something similar to this output:
Output:
File: main
Type: cpu
Time: Aug 25, 2021 at 11:00am (CDT)
Duration: 7.12s, Total samples = 7.08s (99.38%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 4510ms, 63.70% of 7080ms total
Dropped 38 nodes (cum <= 35.40ms)
Showing top 10 nodes out of 69
flat flat% sum% cum cum%
3290ms 46.47% 46.47% 3790ms 53.53% syscall.Syscall
270ms 3.81% 50.28% 4900ms 69.21% fmt.(*ss).ReadRune
160ms 2.26% 52.54% 400ms 5.65% runtime.mallocgc
140ms 1.98% 54.52% 4630ms 65.40% fmt.(*readRune).ReadRune
110ms 1.55% 56.07% 2520ms 35.59% fmt.(*ss).consume
110ms 1.55% 57.63% 5010ms 70.76% fmt.(*ss).getRune
110ms 1.55% 59.18% 4460ms 62.99% io.ReadAtLeast
110ms 1.55% 60.73% 280ms 3.95% runtime.exitsyscall
110ms 1.55% 62.29% 190ms 2.68% runtime.reentersyscall
100ms 1.41% 63.70% 2760ms 38.98% fmt.(*ss).SkipSpace
Ups! That's a lot of syscalls, not good for such a simple task. All these syscalls are due to repetitive call to fmt.Scan to read the elements of the array, so what could we do to avoid all this. Let's make the same thing differently, using "bufio" package we could read these elements. The code would be like this:
package main
import (
"bufio"
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
"strconv"
"strings"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
in := bufio.NewReader(os.Stdin)
tc := readInt(in)
for i := 0; i < tc; i++ {
n := readInt(in)
_ = readArrInt(in)
fmt.Println("N: ", n)
fmt.Println("DONE TC: ", i+1)
}
}
func readInt(in *bufio.Reader) int {
nStr, _ := in.ReadString('\n')
nStr = strings.ReplaceAll(nStr, "\r", "")
nStr = strings.ReplaceAll(nStr, "\n", "")
n, _ := strconv.Atoi(nStr)
return n
}
func readLineNumbs(in *bufio.Reader) []string {
line, _ := in.ReadString('\n')
line = strings.ReplaceAll(line, "\r", "")
line = strings.ReplaceAll(line, "\n", "")
numbs := strings.Split(line, " ")
return numbs
}
func readArrInt(in *bufio.Reader) []int {
numbs := readLineNumbs(in)
arr := make([]int, len(numbs))
for i, n := range numbs {
val, _ := strconv.Atoi(n)
arr[i] = val
}
return arr
}
func readArrInt64(in *bufio.Reader) []int64 {
numbs := readLineNumbs(in)
arr := make([]int64, len(numbs))
for i, n := range numbs {
val, _ := strconv.ParseInt(n, 10, 64)
arr[i] = val
}
return arr
}
This code is a little larger because I wrote four auxiliary functions, the profiling for this program is ridiculously faster than the previous one:
File: main
Type: cpu
Time: Aug 25, 2021 at 11:03am (CDT)
Duration: 302.30ms, Total samples = 230ms (76.08%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 220ms, 95.65% of 230ms total
Showing top 10 nodes out of 40
flat flat% sum% cum cum%
60ms 26.09% 26.09% 60ms 26.09% indexbytebody
40ms 17.39% 43.48% 130ms 56.52% strings.genSplit
30ms 13.04% 56.52% 40ms 17.39% runtime.scanobject
20ms 8.70% 65.22% 170ms 73.91% main.readArrInt
20ms 8.70% 73.91% 20ms 8.70% strconv.Atoi
10ms 4.35% 78.26% 10ms 4.35% internal/bytealg.IndexByteString
10ms 4.35% 82.61% 10ms 4.35% runtime.(*gcBits).bitp (inline)
10ms 4.35% 86.96% 10ms 4.35% runtime.memclrNoHeapPointers
10ms 4.35% 91.30% 10ms 4.35% runtime.osyield
10ms 4.35% 95.65% 10ms 4.35% runtime.spanOf (inline)
GOOD BEY syscalls! This way is more efficient that using fmt.Scan family functions.
Conclusion
If you are going to read a large amount of elements from stdin, use the functions in "bufio" package instead of "fmt". I hope this is useful for someone facing the same issue.
what about fmt's prints?