Auto-translated Chinese national IOI training team report papers

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

This is the Chinese national training team report papers translated into English using several computer tools.

Original papers download:

Auto-translated papers download:

https://nd.nl.tab.digital/s/GqoiQ5b8tpJFrXD

I've only translated some topics, but I will upload more in the future.

I'll update all the files if I find some way to improve the program(s) used to generate the PDF.

I could not find any existing post that translates this, (despite a lot of blog posts that asks for it: 1 2 3 ) and I find it really hard to copy and paste each line into a translator program, so the side-by-side comparison was helpful.

* Rasterize the PDF (so it works with ABBYY) * ABBYY OCR program to extract the Chinese * transpdf.iceni.com to extract the XML file * a custom Python script to convert it to a plain text file, suitable for feeding into an automatic translation program * Chrome "Google Translate" plugin used to translate a plain text file * another Python script to put the content back into the XML. * then feed that into transpdf again and download the "comparison diff file", * and use a Python script (with qpdf) to remove the encryption (allows text copy/paste) and the transpdf watermark.

Possible improvements:

I suppose that the original Chinese characters are still preserved inside the PDF; however direct copy and paste results in corrupted data.
If anyone can figure out how to extract the Chinese characters without OCR, that would improve the translation quality.
(some metadata in a PDF shows that it was made with Microsoft Word 2013 and/or Acrobat 11.0.0)
Currently the translation step requires a little manual work (open the text file in Chrome, run Google Translate plugin); however automating this step is hard.
Online Google Translate limits the input size to ~ 5000 characters.
The "free unofficial" Google Translate API (googletrans Python package) might stop working at any time.
Some translated content are stretched.
This is a limitation of the transpdf website.
The images, math formulas and pseudo code listings are not preserved.
This is a limitation of ABBYY OCR tool. Although it can be fixed manually, I'm not going to do that.
Obviously, Google Translate is not perfect; so you can contribute manual translation.
Some lines are split at incorrect positions, which causes inaccurate translation. I'm not sure how/if it's possible to fix this issue automatically.
You can also write (usually English; however Chinese HTML is still easier to translate than Chinese PDF) blog posts to explain the techniques.
Or find existing content (in English) that describes those techniques.

Rev.	By	When	Δ	Comment
en10	z4120	2021-03-14 16:47:11	1235	Migrate to DeepL
en9	z4120	2021-03-11 08:50:57	969	2020/1 upload
en8	z4120	2020-11-30 19:07:41	61	First publish (published)
en7	z4120	2020-11-30 19:03:03	193	Tiny change: '/i7A1.png) (\n [low resol' -> '/i7A1.png)\n ([low resol'
en6	z4120	2020-11-30 18:54:56	426
en5	z4120	2020-11-30 18:43:17	1491
en4	z4120	2020-11-30 07:26:41	70
en3	z4120	2020-11-30 07:20:55	53
en2	z4120	2020-11-30 07:19:09	3222
en1	z4120	2020-11-29 19:59:45	8	Initial revision (saved to drafts)

History