Loading...
Please wait, while we are loading the content...
Similar Documents
System and Method for Text Detection in An Image
| Content Provider | The Lens |
|---|---|
| Abstract | The present disclosure relates to image processing and analysis and, in particular, automatic detection of text in an image through an application or an application program interface. In some embodiments, text is detected, but not recognized, through a process including: binarization or trinarization of an image; blob detection of the binarized or trinarized image; grouping blobs into horizontal boundaries; and using statistics to determine that some of horizontal bounded blobs are not text. |
| Related Links | https://www.lens.org/lens/patent/014-886-567-943-936/frontpage |
| Language | English |
| Publisher Date | 2017-03-30 |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Patent |
| Jurisdiction | United States of America |
| Date Applied | 2015-09-30 |
| Applicant | Apple Inc |
| Application No. | 201514870677 |
| Claim | A method comprising: receiving a first version of image information; categorizing each pixel of the first version of image information into one of a plurality of categories; for one or more category of pixels, identifying one or more collections of pixels in the first version of image information, wherein a collection of pixels indicates a plurality of pixels that are continuously neighboring each other and are all within the same category of pixels; identifying one or more sequences, each comprising a plurality of collections of pixels, wherein all the pixels in a sequence are within the same category of pixels, and wherein all the collections of pixels within a sequence are horizontally oriented with respect to one and other; eliminating one or more sequences or collections of pixels based upon statistics regarding the one or more sequences or collections of pixels; and determining, without performing glyph analysis for the first version of the image information, that sequences and collections of pixels that have not been eliminated comprise text. The method of claim 1 , wherein a collection of pixels may be a blob or a connected component. The method of claim 1 , wherein categorizing each pixel includes binarization. The method of claim 1 , wherein categorizing each pixel includes trinarization. The method of claim 4 , wherein trinarization comprises categorizing pixels as black, white, or gray; and wherein a pixel categorized as gray indicates that the pixel has been determined as not relating to text. The method of claim 5 , wherein categorizing pixels as black or white is based upon whether a pixel value is greater than or less than a black point or white point. The method of claim 5 , wherein categorizing a pixel as gray correlates with the contrast present in a region of the image in which the pixel is located. The method of claim 1 , wherein the received first version of image information is converted into a plurality of scaled versions, and wherein each scaled version is a representation of the received first version of image information applying a downscaling factor; and further comprising determining, for each scaled version, sequences that comprise text, and comparing the determined sequences of one scaled version to one or more determined sequences of another scaled version or one or more determined sequences of the first version. The method of claim 1 , wherein all collections of pixels are identified in one sweep through the pixels of the received first version of image information. The method of claim 2 , wherein one or more connected components are identified in a first row or first column of pixels of the received first version of image information; and wherein the one or more connected components are associated with another connected component identified in a second row or a second column. The method of claim 1 wherein the received first version of image information is in the form of JPEG, GIF, or RAW. A system comprising: one or more CPUs; one or more cameras for capturing images represented as image information; a memory for storing program instructions for the one or more CPUs, where the instructions, when executed, cause the one or more CPUs to: receive a first version of image information; categorize each pixel of the first version of image information into one of a plurality of categories; for one or more category of pixels, identify one or more collections of pixels in the first version of image information, wherein a collection of pixels indicates a plurality of pixels that are continuously neighboring each other and are all within the same category of pixels; identify one or more sequences, each comprising a plurality of collections of pixels, wherein all the pixels in a sequence are within the same category of pixels, and wherein all the collections of pixels within a sequence are horizontally oriented with respect to one and other; eliminate one or more sequences or collections of pixels, the elimination based upon statistics regarding the one or more sequences or collections of pixels; and determine that sequences and collections of pixels that have not been eliminated comprise text. The system of claim 12 , wherein the instructions that cause the one or more CPUs to categorize comprise instructions that cause the one or more CPUs to trinarize each pixe The system of claim 13 , wherein the instructions that cause the one or more CPUs to trinarize each pixel comprises instructions that cause the one or more CPUs to categorize pixels as black, white, or gray; and wherein a pixel categorized as gray indicates that the pixel has been determined as not relating to text. The system of claim 12 , wherein the instructions, when executed, further cause the one or more CPUs to: convert the received first version of image information into a plurality of scaled versions, wherein each scaled version is a representation of the received first version of image information applying a downscaling factor; determine, for each scaled version, sequences that comprise text; and compare the determined sequences of one scaled version to one or more determined sequences of another scaled version or one or more determined sequences of the first version. A non-transitory computer readable medium comprising one or more instructions that, when executed, configure a processor to: receive a first version of image information; categorize each pixel of the first version of image information into one of a plurality of categories; for one or more category of pixels, identify one or more collections of pixels in the first version of image information, wherein a collection of pixels indicates a plurality of pixels that are continuously neighboring each other and are all within the same category of pixels; identify one or more sequences, each comprising a plurality of collections of pixels, wherein all the pixels in a sequence are within the same category of pixels, and wherein all the collections of pixels within a sequence are horizontally oriented with respect to one and other; eliminate one or more sequences or collections of pixels, the elimination based upon statistics regarding the one or more sequences or collections of pixels; and determine that sequences and collections of pixels that have not been eliminated comprise text. The non-transitory computer readable medium of claim 16 , wherein the instructions that configure a processor to categorize comprise instructions that configure the processor to trinarize each pixe The non-transitory computer readable medium of claim 16 , wherein the instructions, when executed, further configure the processor to: convert the received first version of image information into a plurality of scaled versions, wherein each scaled version is a representation of the received first version of image information applying a downscaling factor; determine, for each scaled version, sequences that comprise text; and compare the determined sequences of one scaled version to one or more determined sequences of another scaled version or one or more determined sequences of the first version. The non-transitory computer readable medium of claim 16 , wherein a collection of pixels may be a blob or a connected component. The non-transitory computer readable medium of claim 19 , wherein the instructions, when executed, further configure the processor to: identify one or more connected components in a first row or first column of pixels of the received first version of image information; and associate the one or more connected components with another connected component identified in a second row or a second column. |
| CPC Classification | IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING |
| Extended Family | 014-886-567-943-936 |
| Patent ID | 20170091572 |
| Inventor/Author | Lindberg Lars M Barnes Leo Holtsberg Anders M |
| IPC | G06K9/34 G06K9/46 |
| Status | Discontinued |
| Owner | Apple Inc |
| Simple Family | 014-886-567-943-936 |
| CPC (with Group) | G06V20/62 |
| Issuing Authority | United States Patent and Trademark Office (USPTO) |
| Kind | Patent Application Publication |