Powered by Guardian.co.ukDan l-artiklu intitolat “Kif nista jikkonvertu noti miktubin bl tiegħi fis Word dokumenti?” ġie miktub minn Jack Schofield, għall theguardian.com-Ħamis 18 Diċembru 2014 16.19 UTC

Għandi ħafna pads A4 ta 'noti miktuba bl-idejn, li nixtieq li jikkonvertu Microsoft Word dokumenti. Tip lilhom kollha se jieħu żmien twil ħafna. Stajt ndunat li l-kapaċità tal-Google biex jinqara test minn ritratti tkun ferm aħjar f'dawn l-aħħar xhur. Int konxju ta 'għodda minn Google jew xi ħadd ieħor li jista' jagħmel xogħol tajjeb ta 'dan il-, jekk jogħġbok? Michael

L-idea tal-konverżjoni test miktub jew stampat fis-test diġitali huwa ġeneralment imsejjaħ OCR għar ottikali tal-karattri, u għandu problemi simili għal diskors rikonoxximent. Jiġifieri, jekk l-input huwa qrib perfetta, l-output jista 'wkoll jkun qrib perfetta.

Iżda fil-prattika, hija taħdem aħjar meta jittrattaw mal-inputs ristretti u / jew oqsma limitati. Per eżempju, huwa possibbli li jiġu rikonoxxuti l-ismijiet Ingliż għal numri u l-ismijiet tal-bliet UK kbar, speċjalment jekk inti tista 'tikseb nies biex jiktbu kull ittra fil-kaxxa ftit tagħha stess. L-istess software ma jkollhomx il-kompetenza tad-dominju li jlaħħqu ma 'coroner jitkellmu r-Russu li xtaqu li jinkludu kwotazzjonijiet Sanskrit fil awtopsji miktuba bl-idejn tiegħu.

kwistjonijiet Kalligrafija

OCR xogħlijiet aħjar ma 'materjal stampat ta' kwalità għolja u l-agħar ta 'kollha bil-kalligrafija, sabiex inti ma tkunx qed jibdew mill-aħjar pożizzjoni. Fl-esperjenza tiegħi, inti tista 'biss tikseb rikonoxximent kalligrafija li jaħdem tajjeb biżżejjed billi jagħmel dan fi żmien reali. Li jgħinuk li jħarreġ l-software li jirrikonoxxu l-input tiegħek, filwaqt li l-softwer wkoll trejns inti tikteb karattri b'modi li jistgħu jifhmu. I kellna xi suċċess ma 'dan l-approċċ, jibdew iktar minn għaxar snin ilu mal-Microsoft OneNote (li jista 'wkoll jirreġistra l-vuċi tiegħek sync) taħdem fuq Windows XP Pillola Edition, u aktar reċentement bil-pinna diġitali Livescribe Echo u MyScript softwer. Madankollu, dan kollu għandu aktar x'jaqsmu ma 'l-istrateġiji ta' sostituzzjoni tastiera milli bil OCR.

Huwa ġeneralment miftiehem li l-programmi OCR aħjar huma Abbyy FineReader (£ 99) u OmniPage influwenzata s 18 (£ 79.99) u Ultimate (£ 169.99), għalkemm la hu adattat għar-rikonoxximent kalligrafija cursive. Iż-żewġ kumpaniji joffru verżjonijiet prova ħielsa sabiex inti tista 'test minnhom qabel ma inti tixrid. Hemm ukoll SoftWriting CharacTell s ($49.95), li tgħid il-kumpannija hija għall-istudenti li jieħdu noti fil-klassi u professjonisti li jieħdu n-noti fil-laqgħat. Iżda huwa jgħid ukoll li huwa ddisinjat "sabiex jiġu rikonoxxuti kalligrafija mhux konnessi u test stampat magna" (enfasi tagħhom) so I ma bet fuqha qari noti miktuba bl-idejn tiegħek.

Bħal ħafna jekk mhux l-programmi kollha f'dan il-qasam, SoftWriting għandha tkun mħarrġa biex jagħrfu kalligrafija tiegħek. Meta jkun ipproċessar ta 'dokument, se tippreżenta inti ma kelmiet ma jirrikonoxxux, sabiex inti tista 'tgħid dan dak li huma. Jekk għandek 250 kliem fuq paġna u l-programm miraculously gets 90% minnhom dritt, inti xorta se jkollhom biex jikkoreġu 25 kliem.

Jekk inti tixtieq li jippruvaw ftit paġni bħala esperiment, allura inti tista 'tniżżel FreeOCR għall-Windows, għalkemm ikunu attenti biex ma twaħħal kwalunkwe crapware li jistgħu jiġu inklużi. FreeOCR hija bbażata fuq il-magna Tesseract OCR użati ħafna, li kien żviluppat oriġinarjament minn Hewlett Packard fl-Ingilterra fis-snin 1980. HP għamilha sors miftuħ 2005, u Google issa żżomm il-kodiċi sors.

Tista 'wkoll tuża FreeOCR online billi uploading fajls PDF biex free-ocr.com. Google Docs u servizzi oħra varji wkoll jużaw l-istess magna Tesseract OCR.

Wikipedija iwissi li "produzzjoni Tesseract se jkunu ta 'kwalità fqira ħafna jekk l-immaġini input mhumiex preprocessed li suit dan: stampi (speċjalment screenshots) għandhom jitkabbru b'mod li t-test x-għoli huwa mill-inqas 20 pixels, kwalunkwe rotazzjoni jew jimmodifika għandhom jiġu korretti jew mhux test se jiġi rikonoxxut, bidliet ta 'frekwenza baxxa fis-luminożità għandhom ikunu għolja jgħaddu iffiltrat, jew l-istadju binarization Tesseract se jeqred ħafna tal-paġna, u l-fruntieri mudlama għandu jitneħħa manwalment, or they will be misinterpreted as characters.”

PDFs and scanners

Your handwritten notes would be more useful in Microsoft Word format because you could do lots of things with them. Per eżempju, you could change the typeface, size and spacing, correct and amend your notes, add illustrations, u l-bqija. But unless you have extremely neat, clear and very consistent handwriting, that probably won’t be possible. minflok, think about converting them to high-quality, scanned PDF files that you can store on a hard drive or in the cloud.

You can feed these PDF files to OCR software and hope that it will recognize enough words to make your notes searchable. Jekk le, you will probably have to tag them manually. Mż-żewġ naħat, if someone does come up with an OCR program that can read your handwriting – not impossible, though I’ve already waited 30 years for one – you will be ready with sharp PDF files, rather than curling originals where the paper has aged and the ink has faded.

Dażgur, if you are going to scan your notes then you must already have a scanner, or be prepared to buy one. A cheap Epson jew Canon flat-bed scanner should give good results, though it is time-consuming to scan a lot of pages. If you intend to do a lot of scanning, consider a sheet-fed model like the Brother ADS-2100 (from £222). You can also get scanners that include OCR, bħal Fujitsu ScanSnap iX500 Duplex (from £352), which scans both sides of the paper at once. (The scanner’s OCR software usually runs on your PC.)

Scanning services

If you have to buy a decent scanner and perhaps good quality OCR software for a one-off project, add up the cost and divide it by the number of pages of notes to find the cost per page. It’s a boring job, so perhaps you should add the cost of your time. The result might prompt you to abandon the whole idea, or start looking for a company to do it for you.

Most of the companies that provide scanning services cater for businesses that need to clear away large volumes of paper records. Madankollu, some cater for low-volume and home users. One example is Oxford-based Scanning Geeks, which charges 25p per page for documents up to A3 in size. (One page means one side of a page.) They can do OCR (“Textual Data Capture”) as well. idealment, find a good local company where you can drop off your notes securely and collect them afterwards.

It’s an expensive route if you have lots of paper: it could cost £3,000 to scan the contents of a four-drawer filing cabinet. But if you only have 100 biex 500 pages of notes to scan, it could be the best option.

guardian.co.uk © Guardian News & Media Limited 2010

Ippubblikat permezz tal- Guardian News Feed plugin għall WordPress.

29106 0