Hari's Random Thoughts by Hariharan Ramamurthy: Is Google really making us stupid? crooked handwriting and handwriting recognition.

Monday, August 15, 2016

Is Google really making us stupid? crooked handwriting and handwriting recognition.

I personally think NOT

Yes it does make you act more like you have ADHD
I think it changes the way your brain works to a certain extent
the debate is similar to asking do calculators make us arithmetic challenged?
True many gadgets make some of the previous skills you had ( remembering phone numbers, doing simple mental math) redundant so without practice theses skills are weakened and ultimately lost.
But the question is do we need those skills when you can buy a solar charged calculator which can do a much better job do we need to use those brain cells to remember multiplication tables?
I should ask my nephew who was in Liberia to teach the kids mathematics.

on a different note

I am glad that Google has decided to add telugu to many other indic languages for development.

I am really amazed at their HWR
it can recognize even my crooked(doctor ) hand writing .
KUDOS to Google HWR developers .any telug guys among that group ?

15 years ago I started on a quest to develop a handwriting recognizer for telugu.
this was due to my inability to type telugu fast enough to create a website for diabetes in telugu.

There was hardly any telugu language gadgets
no HWR, NO OCR not even a good unicode editor.
(things are no better now ,another blog post on this some time in future)

there used to be(still is but defunct I think) a forum/google group called racchabanda
where in 2004
I had raised this topic

see related posts

some time in 2004 or 2005 I had raised this topic with
adluri seshu madhav and that time and now we still do not have a
good OCr for indian languages and telugu in particular

today I saw some work being done at

Tesseract open source OCR engine. (http://code.google.com/p/tesseract-
ocr/)
some one seems to have adapted it to hindi kannada and malayalam

if it works for kannada script it should easily work for
telugu

i am not a computer programmer

but with so many "software " guys from andhra pradesh is there
some one out there who can spend some effort to make this
free OCR for telugu avaialble to all some time soon in the
future ?

hariharan ramamurthy

రచ్చబండ (తెలుగు-యూనికోడ్) ›

Re: [racchabanda] Digest Number 693 1 post by 1 author

Srinivas Kommu

5/9/02

Other recipients: racch...@yahoogroups.com

I have a question. How do the telugu sites generate content? eenaaDu, sify
etc? I thought they use some kind of OCR. vaarta can manage just with a
scanner, I guess. Or do they type all text in RTS and use a conversion tool?
How does Andhrabharati (http://www.andhrabharati.com) do it? How do they get
it so right with all the arasunnaalu and stuff? If they typed in RTS, I'd
expect at least some typos :) I'm thinking about what would be the best way
if I were to generate my own content.

thanks
srinivas

> Message: 1
> Date: Mon, 6 May 2002 17:47:32 -0000
> From: "V. Chowdary Jampala"
> Subject: Computer Tools for Telugu Language
>
> Techonology Resource Center for Telugu (TRCT)
>
> During my recent visit to India, I had the opportunity to visit the
> Techonology Resource Center for Telugu (TRCT) located at the University of
> Hyderabad and spend some time with Dr. K. Narayana Murthy, an Associate
> Professor in the Department of Computer/ Information Sciences. TRCT is one
of
> several language technology resource centers funded by the Govt of India
with
> the idea of developing and standardizing the technology tools for various
> Indian languages. These centers collaborate with each other in this
endeavor.
> TRCT has several exciting tools in development.
>
> AKSHARA: Multi-lingual editor for Indian languages
>
> Most word processors in Telugu are commercial products. They use
> proprietary keying schemes, and what is typed cannot be stored as 'text',
but
> as code understood only that program. A document typed in one word
processor
> cannot be transported to another; fonts used by one program cannot often
be
> used by another. Even the non-profit and non-commercial products have the
same
> problems. Rachana word processing program was the only program that I used
> which allowed storing a document in ISCII text, but its evolution stopped
with
> DOS only. When I was editing TANA Patrika, this lack of transferability
> between programs was a major source of heartache for me. I now understand
(I
> haven't seen this myself) that Microsoft Windows-XP comes with Telugu
fonts
> and a built-in keyboard driver. I suppose that this would go some distance
> towards standardization and transferability, but not all the way.
>
> In my conversations with Dr. Narayana Murthy, it is quite apparent
that
> he has thought through these issues in detail. His Center has developed
> 'Akshara', an advanced multilingual editor, to deal with these very
problems.
> Akshara is a platform and font independent editor and can deal with large
and
> complex files without problems. It is based on ISCII and supports UNICODE.
It
> supports multiple keyboard layouts and offers an on-screen keyboard as
well.
> The files can be transliterated from Telugu to other Indian languages and
vice
> versa. It includes support for creating interactive web pages in Indian
> languages. (I was told that it uses an XML style Extensible Document
> Definition Language (XDL)- an open representation format making linkage
with
> other tools easy. Some of you techies may understand what that means; I
> don't).
>
> Akshara also comes with a variety of text processing tools (merging,
> sorting etc). Plans for Akshara include dictionaries, spell checkers,
> morphological analyzers etc.
>
> A beta version of Akshara should be available shortly. In the best
news
> of all, TRCT plans to distribute Akshara free once it is ready. (In other
> related good news, I understand that the Telugu fonts and the I-leap
software
> developed by C-DAC are now purchased by the Govt of Andhrapradesh and are
now
> placed in public domain; they can be downloaded from
> http://www.andhrapradesh.com/teluguwebsite/).
>
>
> DRISHTI - an OCR for Telugu
>
> Exciting as the news about Akshara is, even more exciting was
Drishti
> Optical Character Recognition (OCR) software for Telugu. Yes, folks; there
is
> an OCR for Telugu in actual development that reads printed Telugu pages!
Dr.
> Atul Negi, a Reader in the CS department, and Dr. Narayana Murthy
demonstrated
> an earlier version of this software to me. The latest version is claimed
to
> have about 95% + accuracy with work continuing on the algorithms to
improve it
> further.
>
>
> VAANI - Text to Speech software
>
> Other projects at TRCT include the development of a Text to Speech
> software Vaani. This is still in the early stages of development.
>
> TRCT plans to link Drishti and Vaani with Akshara. Theoretically,
Akshara
> can read out a printed page to a blind person.
>
> There are several other projects in development including e-mail
software
> and grammatical software. I understand that Dr. G. Umamaheswara Rao from
the
> department of linguistics and his team are actively working on the issue
of
> Computational aspects of language structure. I regret that I could not
meet
> him.
>
> The future for Telugu computing does indeed appear to be very bright!
>
> Regards -- V. Chowdary Jampala
>
>
>
>

Courtesy: http://www.kanneganti.com/

--- In racch...@egroups.com, Nasy Sankagiri wrote:
> 1) What is OCR?

Nasy: The reasons for forwarding that message from the Indology list
were different. In 1st line I wanted to point out that there are mss
in Telugu script lying elsewhere, and apparently there are some
attempts going on developing OCR (Optical Character Recognition)
software for Telugu too.

Madan gaaru: At the 1st N.America Telugu lit. meet, Baparao gaaru
mentioned about some ongoing attempts about "OCR for Telugu". Baparao
gaaru, are you there! Also, I remeber hearing from someone that Dr.
Desikachari (of pOtana fonts fame) has been working on such project.

Raama: Are you refering to Mann's (I mean the elder; Thomas):
Zauberberg (Magic Hill in Engli

Prasad A. Chodavarapu

8/26/01

Other recipients: sar...@yahoogroups.com, racch...@yahoogroups.com

I've finally bought a scanner and finished the experiements necessary to
figure out how to produce quality output that's electronically
distributable, without having to retype in printed telugu material. I
choose a ko.ku. story to experiment with. I don't have permission to
copy, so, you would have to approach me by private email to take a look
at the output. Believe me, the quality of the output is very good.

Here's what I did, in full technical detail.

1. Everything except the scanning process itself can be automated.
Hence, it is important for us to scan at a high quality so as to reuse
this work in future. e.g. someone someday may finally write a Optical
Character Recognition (OCR) tool that can recognize text within the
scanned images.
- I scanned at 300 dpi that's optimal for OCR. lesser would not be
enough and more is actually harmful, even if we are willing to store
larger files.
- I saved the scanned images as TIFFs, an image file format that
supports lossless compression.
- Ofcourse, remember to always scan and save as monchrome images. Our
printed material rarely uses colour and taking advantage of this fact
keeps our file sizes small.

2. What's the best mechanism for distribution? PDF seems to be the
answer. It is a quality solution for both on-screen reading as well as
printing. It helps us package multiple scanned pages in one file.
Acrobat reader is something that's freely available for everyone
everywhere. Most computers have it installed by now. I used free
software from http://www.imagemagick.org/ to convert multiple TIFFs into
one pdf.

3. There's one trick that I had to come up with. The images were roughly
the same size in original except for the first and the last page, where
I omitted scanning white space. PDF viewers and printers scale the image
to make it fit into standard size printer paper. As the scanned images
were of different sizes, the scale factors were different too and hence,
the size of text on the first and the last page was markedly different
from that on the others.

Also, I had to scan without the margins while scanning to get rid of the
black marks at the center that are a result of the book's shape. So, how
do I add back margins?

A simple solution was to first pad all images with white space so as to
bring them all up to one fixed size. This also provides us a way of
keeping margins consistent between pages.

4. Seems like a rule of thumb would be 50KB per page. With a 28.8 kbps
modem, the fastest you can download a 1 page doc is 14 seconds. I am
able to reduce the file size by half if I resample the image at 150 dpi.
The text is still readable, but there is a noticeable change in quality.

cheers
prasad

Courtesy: http://www.kanneganti.com/

"అంతర్జాతీయతను సాధించుకోవాలనే తమిళుల పట్టుదలకు ఈ తీర్పువల్ల వచ్చే ఇబ్బంది ఏమీ లేదు. వారి పనిని వారు చేసుకుపోతూనే ఉన్నారు. వారు తమిళాన్ని ప్రాచీన విశిష్ట భాషగానే కాదు, ఒక అధునాతనమైన భాషగా కూడా రూపొందించుకోవడానికి నిరంతరం కృషి చేస్తున్నారు. ప్రపంచీకరణ నేపథ్యంలో ఆంగ్లభాష ఆధిపత్యవాదాన్ని సమర్ధంగా ఎదుర్కొంటూ దానితో సహజీవనం చేయగలుగుతున్నారు. అక్కడ తమిళం గర్వంగా తలెత్తుకొని నిలబడుతూనే, అన్ని రంగాల్లో తన బావుటాను నిలబెట్టుకొంటూనే, ఆంగ్లానికి అవసరమైనంత చోటు ఇస్తున్నది. ఆవిధంగా అక్కడ రెండు భాషల సూత్రం స్థిరపడిపోయింది."

తెలుగుకు క్లాసికల్ హోదా ఇచ్చినా, పనులు కాకపోవడానికి హైకోర్టులో ఉన్న వ్యాజ్యం కారణం అనే ప్రచారం కొందరి బుర్రలకెక్కేసింది. అయితే నిజం ఏమిటంటే ఆ కోర్టుతీర్పునకు లోబడే ఈ హోదానిస్తున్నామని కేంద్రం ఎన్నడో కోర్టుకు స్పష్టం చేసింది. కేంద్ర ఆర్థిక సహాయాన్ని అడ్డుకొనే ఆదేశాలను, కోర్టు ఇవ్వలేదు. కనుక-పరిమితమైన ప్రణాళికతో, నిధులతో కొంత పని గత అయిదేళ్లలో జరిగి వుండేది. మనకు మనం మైసూరులో పరిశోధనా సంస్థ ఏర్పాటును అడ్డుకోకుండా ఉంటే..కాదు కాదు, ఆ సంస్థను నాటి ఆంధ్రప్రదేశ్ రాజధానిలోనో మరో చోటనో ఏర్పాటు చేసుకొని ఉంటే, ఇప్పటికే ఎంతో పని జరిగి ఉండేది.
అయితే ఇదంతా నాటి ఆంధ్రప్రదేశ్ ప్రభు త్వం వారి పట్టనితనం వల్ల, చేతగాని తనం వల్ల జరిగిపోయింది. ఆ సంస్థకోసం కేవలం ఒక పెద్ద భవనం, మరికొన్ని ఎకరాల స్థలం రాష్ట్ర ప్రభుత్వం ఇవ్వాలి. అంతకుమించి ఒక్కపైసా అయినా రాష్ట్ర ప్రభుత్వం ఇవ్వవలసిన అవసరం లేదు. నియామకాలు, నిర్వహణ అంతా మైసూరులోని సిఐఐఎల్ వారిదే. ఇందులో ఎవ్వరికీ జోక్యంచేసుకునే వీలు లేదు.ముగ్గురు ముఖ్యమంత్రులు మారారు. రాష్ట్రం రెండు రాష్ట్రాలైంది. కొంత స్థలాన్ని, ఒక భవనాన్ని ఇందుకోసం ఇవ్వడం చేతగాని ప్రభుత్వాలు మనల్ని పరిపాలిస్తున్నాయి. దీనికి ఎవరిది బాధ్యత?
- డా॥ సామల రమేష్‌బాబు
తెలుగు భాషోద్యమ సమాఖ్య అధ్యక్షుడు

రెండు తెలుగు రాష్ట్రాల్లో కలిపి 9 కోట్ల మంది తెలుగువారుంటే, బయట మరో 9 కోట్ల మంది తెలుగువారు ఉన్నారు./

Hari's Random Thoughts by Hariharan Ramamurthy

Monday, August 15, 2016

Is Google really making us stupid? crooked handwriting and handwriting recognition.

Re: Status of Telugu OCR.

Telugu OCR.

telugu conjuncts and OCR.

telugu OCR out of box ! out andof the box thinking Or just a boxing problem ?

[సంగణన] eudcedit application (hidden) in windows OS ( does it help in making OCR data)

Re: OCR Meet tomorrow (Sunday 02 August 2009) in Hyd.

OCR accuracy and claims of various organisations.

Tesseract OCR for telugu. Error in retreiving OCRed text.

Re: Telugu ocr.

OCR spell check.

[సంగణన] Re: Tesseract OCR.

why a separate spell check for OCR.

Re: ఏదైనా ఐడియాలు.

[సంగణన] Tesseract OCR for Telugu.

Training Data needed for OCR.

Re: తెలుగు ఓసిరావే !

Re: [సంగణన] Re: ఆఱేండ్ల తరువాఁత.

Re: tesseract కోఱకున్ కైఁ.

పోతన ఖతికి మార్పులు కావలెను.

Re: ఆన్లైను ఓసిఆర్ - తెలుగు, కన్నడ, హిందీ, తమిళం, మలయాళం ...

Re: యూనీకోడు డంపు కావలెను.

Re: Problems with Drishti.

your PEARL on BROWN dictionary.

Re: [సంగణన] Re: ఒక సలహా ఇవ్వండి.

మఱికాస్త ఊరింత.

Re: తెలుగు ఓసీఆర్ ట్రైనింగు కోసం tiff ఫైళ్ళు.

Re: తెలుగు పదాల జాబితా కావాలి.

[సంగణన] Re: CVTEMeghna , CVTEHarsha Telugu fonts.

Re: [సంగణన] Re: డిటిపి సహాయం కావాలి.

Qt Anyone?

Tesseract 3.0 Tessdata files.

Re: తెలుగు సంగణన లొ తెలుగు ఓసిఆర్ /ఒక తెలుగు రోబోట్.

No comments:

Pages

Search This Blog