Powered by Guardian.co.ukMaqaalkaan oo cinwaankeedu ahaa “Sidee baan qoraalada aan qoray badalo karaa waraaqaha Word?” waxaa qoray Jack Schofield, waayo theguardian.com on Thursday December 18 2014 16.19 UTC

Waxaan leeyahay pads badan A4 qoraalada lagu qoray, taas oo aan jeclaan lahaa in loogu badalo galay dokumentiyo Microsoft Word. Si ay iyaga ku qor oo dhan qaadan lahaa muddo aad u dheer. Waxaan ogaaday in awoodda Google ee loo akhriyo qoraalka ka photos ayaa si weyn u soo hagaagtay bilihii la soo dhaafay. Ma la socotaa qalab ka Google ama qof kasta oo kale samayn kara shaqo fiican of this, fadlan? Michael

Fikradda ah diinta qoraal ah ama text daabacay galay text digital waxaa guud ahaan loo yaqaan OCR aqoonsi qof indhaha, iyo waxa ay leedahay dhibaato la mid ah in ay aqoonsi hadalka. Taasi micnaheedu waa in, haddii talooyin ku dhow yahay inuu kaamil, wax soo saarka sidoo kale waxay noqon kartaa meel u dhow kaamil.

Laakiin in dhaqanka, waxaa ugu fiican u shaqeysaa marka la gashiga xaddidaad iyo / ama domains kooban ka qabashada. Tusaale ahaan, waxa suurtagal ah in la aqoonsado magacyada English for tirada iyo magacyada magaalooyinka waaweyn ee UK, gaar ahaan haddii aad ka heli kartaa dadka in ay qoraan warqad kasta oo ay sanduuq yar oo u gaar ah. software The isku haysan lahaa khibrad domain si ay ula qabsadaan baara Ruush-ku hadla kuwaas oo jeclaa in ka mid ah sheegyada Sanskrit in uu damacsanaa lagu qoray ee.

arrimaha Qorista

OCR wanaagsan ayuu ula shaqeeyaa qalab tayo sare leh daabacan oo ugu xumaa ee dhan la far, si aadan bilaabin ka booska ugu fiican. In aan waayo-aragnimo, waxa kaliya ee aad heli kartaa aqoonsi far in ay ka shaqeeyaan si fiican oo ku filan by waxa ay qabanayaan in waqtiga dhabta ah. Si aad u saamaxaaya in ay tababar software ah in la aqoonsado aad aqbasho, halka software ayaa sidoo kale aad tababar loo qoro characters siyaabo in ay fahmi karaan. Waxaan lahaa qaar ka mid ah guul oo habkani, laga bilaabo ka badan toban sano ka hor la OneNote Microsoft (kaas oo sidoo kale codkaaga u hagaagsan u qoraan karaan) ordaya on Windows XP Tablet Edition, iyo in ka badan dhawaan qalin digital Livescribe Echo iyo MyScript software. Si kastaba ha ahaatee, Waxaas oo dhan waxay leedahay dheeraad ah in la sameeyo xeeladaha bedelka keyboard badan la OCR.

Waxa guud ahaan la isku raacay in barnaamijyada OCR fiican yihiin Abbyy FineReader (£ 99) iyo Midka wanaagsan ee OmniPage 18 (£ 79,99) iyo Ultimate (£ 169,99), inkastoo mana haboon aqoonsi far isku dardarsan oo waa. Labada shirkadood bixiyaan versions maxkamad free si aad u Imtixaanno kartaa ka hor inta aadan qarashgareeyo soo. Waxa kale oo jira CharacTell ee SoftWriting ($49.95), oo ay shirkaddu sheegtay in waa in ardaydu qaadato qoraalada fasalka iyo xirfadlayaasha qaadashada qoraalada shirarka. Laakiin waxaa sidoo kale sheegay in waxaa loogu talagalay "waayo garashada far non-xiran iyo qoraalka mashiinka-daabacay" (ay xooga) si aanan sharad ku geli lahaa on waxay reading aad qoraallada lagu qoray.

Sida ugu dhammaan barnaamijyada ee arimahan haddii aan, SoftWriting in la tababaray in la aqoonsado aad far. Marka loo baaraandegidda dukumenti, waxa aad soo bandhigi doona hadallo ma aqoonsan, si aad u sheegi kartaa waxa ay yihiin. Haddii aad qabto 250 erayada bogga a iyo barnaamijka mucjisooyin helo 90% iyaga ka mid ah xaq u, waxaad weli u yeelan doonaan in ay saxaan 25 erayada.

Haddii aad rabto in ay isku dayaan dhowr bog sida tijaabo ah, ka dibna aad kala soo bixi kartaa FreeOCR for Windows, inkastoo taxadir in aadan si loo soo dajiyo crapware kasta oo ka mid noqon kara. FreeOCR waxay ku salaysan tahay engine loo isticmaalo Tesseract OCR, kaas oo markii hore ay diyaariyeen Hewlett Packard-ku England 1980. HP ka dhigtay in ay il furan 2005, iyo Google hadda haysaa code isha.

Waxaad kaloo isticmaali kartaa FreeOCR online by uploading faylasha PDF in free-ocr.com. Google Docs iyo adeegyo kale oo kala duwan sidoo kale ay isticmaalaan engine isla Tesseract OCR.

Wikipedia digtay in "wax soo saarka Tesseract noqon doonaa tayada aad u liita hadii images talooyin la preprocessed in ay ku habboon: Images (gaar ahaan Screenshot) waa in la kor sida in text x-height ugu yaraan waa 20 pixels, wax wareeg ama ahi.Waxa waa in la saxo ama text ma loo aqoonsan doono, isbedel low-jeer oo iftiin waa in ay ahaadaan-pass sare sifeeyo, ama marxaladda binarization Tesseract ee baabbi'in doonaa badan ee bogga, iyo xuduudaha madow waa in gacanta laga saaro, or they will be misinterpreted as characters.”

PDFs and scanners

Your handwritten notes would be more useful in Microsoft Word format because you could do lots of things with them. Tusaale ahaan, you could change the typeface, size and spacing, correct and amend your notes, add illustrations, iyo wixii la mid ah. But unless you have extremely neat, clear and very consistent handwriting, that probably won’t be possible. Halkii, think about converting them to high-quality, scanned PDF files that you can store on a hard drive or in the cloud.

You can feed these PDF files to OCR software and hope that it will recognize enough words to make your notes searchable. Haddii aadan, you will probably have to tag them manually. Si kastaba ha ahaatee, if someone does come up with an OCR program that can read your handwriting – not impossible, though I’ve already waited 30 years for one – you will be ready with sharp PDF files, rather than curling originals where the paper has aged and the ink has faded.

Dabcan, if you are going to scan your notes then you must already have a scanner, or be prepared to buy one. A cheap Epson ama Canon flat-bed scanner should give good results, though it is time-consuming to scan a lot of pages. If you intend to do a lot of scanning, consider a sheet-fed model like the Brother ADS-2100 (from £222). You can also get scanners that include OCR, sida Fujitsu ScanSnap iX500 Duplex (from £352), which scans both sides of the paper at once. (The scanner’s OCR software usually runs on your PC.)

Scanning services

If you have to buy a decent scanner and perhaps good quality OCR software for a one-off project, add up the cost and divide it by the number of pages of notes to find the cost per page. It’s a boring job, so perhaps you should add the cost of your time. The result might prompt you to abandon the whole idea, or start looking for a company to do it for you.

Most of the companies that provide scanning services cater for businesses that need to clear away large volumes of paper records. Si kastaba ha ahaatee, some cater for low-volume and home users. One example is Oxford-based Scanning Geeks, which charges 25p per page for documents up to A3 in size. (One page means one side of a page.) They can do OCR (“Textual Data Capture”) sidoo. Sida habboon, find a good local company where you can drop off your notes securely and collect them afterwards.

Waa wadada qaali ah haddii aad leedahay badan oo warqad: waxaa laga yaabaa inay £ 3,000 si scan ka kooban golaha wasiirada afar dhaansha buuxinta. Laakiin haddii aad qabto 100 in 500 pages qoraalada in iskaan, waxa laga yaabaa in ikhtiyaarka ugu fiican.

guardian.co.uk © Guardian News & Media Limited 2010

Published via ah Guardian News Feed plugin ee WordPress.

29074 0