Optical Character Recognition (OCR)OCR stands for Optical Character Recognition and is the conversion of scanned images (i.e. handwritten, type written, printed text) into machine encoded (digital) text.
The clearer and larger the characters are, the better the system will recognise them. The problem with Chinese characters is that every character has to matched for recognition against thousands of individual characters (compared to less then 100 latin characters), who are rather complex in structure.
Chinese punctuation like a '。' (dot) can be misread as 'o' (letter o) or '0' (zero)
Google Drive has the ability to OCR uploaded PDF's and image files in Simplified & Traditional Chinese.
To use it, go to Google Drive > Settings (icon) > Upload settings > Convert text from uploaded PDF and image files
If you have activated this setting, the next time you upload a PDF or image there will be a setting to choose to do the OCR and the base language of the file
Open Microsoft Office Document Imaging (Microsoft Office > Microsoft Office Tools > Microsoft Office Document Imaging)
Run the OCR function (Tools > Option > OCR > Chinese) Export the text to Microsoft Word (send Text to Word)
You can then use the Translate function (in the Review menu) to machine translate the text.