In my previous post, I discussed a fast Scanner for Scala.
To benchmark it, I wrote the fastest possible Scanner I could in Java:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.util.Arrays;
/**
* Hand built using a char buffer
*/
public class ArrayBufferScanner extends AbstractScanner {
private char[] buffer = new char[1<<4];
private int pos = 0; // if negative, nothing else to read
private BufferedReader reader;
public ArrayBufferScanner(BufferedReader reader) {
super(reader);
this.reader = reader;
}
@Override
public boolean hasNext() {
return pos != -1;
}
private void loadBuffer() throws IOException {
pos = 0;
while(true) {
int i = reader.read();
if (i == -1) {
pos = -1;
break;
}
char c = (char) i;
if (c != ' ' && c != '\n' && c != '\t' && c != '\r' && c != '\f') {
if (pos == buffer.length) {
buffer = Arrays.copyOf(buffer, 2 * pos);
}
buffer[pos++] = c;
} else if (pos != 0) {
break;
}
}
}
@Override
public String next() {
try {
loadBuffer();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return String.copyValueOf(buffer, 0, pos);
}
@Override
public String nextLine() {
try {
return reader.readLine();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
@Override
public int nextInt() {
return Integer.parseInt(next());
}
}
Is this the best I can do?
It is barely faster than Java's StreamTokenizer (NOT StringTokenizer) inspite of being much simpler than it: http://docs.oracle.com/javase/8/docs/api/java/io/StreamTokenizer.html
Java source: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/java/better/files/ArrayBufferScanner.java
Other Scanners: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/scala/better/files/Scanners.scala
Benchmark results: https://github.com/pathikrit/better-files/tree/master/benchmarks