


We began by collecting a representative set of donated document images that match what users might upload, such as receipts, invoices, letters, etc. Our initial task was to see if we could even build a state of the art OCR system at all. We will take you through each of these steps in turn. Refinement of the system in the real world.Productionization of the model for actual end users.Research and prototyping to see if something is possible.Most commercial machine learning projects follow three major steps: In the rest of this blog post we will take you behind the scenes of how we built this pipeline at Dropbox scale. Perhaps the most important reason for building our own system is that it would give us more control over own destiny, and allow us to work on more innovative features in the future.
Apple ocr sdk how to#
The last few years has seen the successful application of deep learning to numerous problems in computer vision that have given us powerful new tools for tackling OCR without having to replicate the complex processing pipelines of the past, relying instead on large quantities of data to have the system automatically learn how to do many of the previously manually-designed steps. The process to build these OCR systems was very specialized and labor intensive, and the systems could generally only work with fairly constrained imagery from flat bed scanners. Most methods rely on binarization of the input image as an early stage, and this can be brittle and discards important cues. For example, one module might find lines of text, then the next module would find words and segment letters, then another module might apply different techniques to each piece of a character to figure out what the character is, etc. Traditionally, OCR systems were heavily pipelined, with hand-built and highly-tuned modules taking advantage of all kinds of conditions they could assume to be true for images captured using a flatbed scanner. In fact, a sea change has happened in the world of computer vision that gave us a unique opportunity. Thus, there might be an opportunity for us to improve recognition accuracy. Second, the commercial system was tuned for the traditional OCR world of images from flat bed scanners, whereas our operating scenario was much tougher, because mobile phone photos are far more unconstrained, with crinkled or curved documents, shadows and uneven lighting, blurriness and reflective highlights, etc. Once we confirmed that there was indeed strong user demand for the mobile document scanner and OCR, we decided to build our own in-house OCR system for several reasons.įirst, there was a cost consideration: having our own OCR system would save us significant money as the licensed commercial OCR SDK charged us based on the number of scans. This meant integrating the commercial system into our scanning pipeline, offering both features above to our business users to see if they found sufficient use from the OCR. When we built the first version of the mobile document scanner, we used a commercial off-the-shelf OCR library, in order to do product validation before diving too deep into creating our own machine learning-based OCR system. Create a hidden overlay so text can be copied and pasted from the scans saved as PDFs.Extract all the text in scanned documents and index it, so that it can be searched for later.Once OCR is run, we can then enable the following features for our Dropbox Business users: This process extracts actual text from our doc-scanned image. Hence the need to apply Optical Character Recognition, or OCR. Our mobile document scanner only outputs an image - any text in the image is just a set of pixels as far as the computer is concerned, and can’t be copy-pasted, searched for, or any of the other things you can do with text. The document scanner makes it possible to use your mobile phone to take photos and "scan" items like receipts and invoices. In previous posts we have described how Dropbox’s mobile document scanner works.
