Monday, August 15, 2016

Is Google really making us stupid? crooked handwriting and handwriting recognition.

 I personally think NOT

Yes it does make you act more like you have  ADHD
I think it changes the way your brain works to a certain extent
the  debate is similar to asking do calculators make us arithmetic challenged?
True   many gadgets make  some of the previous skills you had ( remembering  phone numbers, doing  simple mental math) redundant so without practice  theses skills are weakened and ultimately lost.
But the question  is  do we need those skills when you can buy a solar charged calculator which can  do a much better job do we need to use those brain cells to remember multiplication tables?
I should ask my nephew  who was in Liberia to teach the kids mathematics.

on a different note

I am glad that Google  has decided to add telugu to  many other  indic languages for development.

I am really amazed at their  HWR
it can recognize even my  crooked(doctor ) hand writing .
KUDOS to  Google  HWR developers .any telug guys among that group ?



15 years ago I started on a quest to  develop a handwriting  recognizer for telugu.
this was due to my inability to type telugu fast enough to create  a website  for diabetes in telugu.

There  was hardly any telugu language gadgets
no HWR, NO OCR  not even a  good  unicode editor.
(things are  no better  now ,another  blog post on this  some time  in future)

there  used to be(still is but  defunct I think)  a forum/google group called racchabanda
where in 2004
I had raised this topic


see related posts

some time in  2004 or 2005   I  had  raised  this  topic  with   
adluri seshu madhav  and  that time  and  now  we still do not have  a 
good  OCr for  indian  languages  and  telugu in particular
today  I  saw  some  work being  done  at
 Tesseract open source OCR engine. (http://code.google.com/p/tesseract-
ocr/
)
some one  seems to have   adapted it to  hindi kannada and  malayalam
 if it  works  for  kannada  script  it  should  easily work  for
telugu
i am  not  a  computer  programmer  
but  with  so many  "software " guys  from  andhra pradesh   is there
some one  out  there  who  can  spend  some   effort  to  make  this
free  OCR  for  telugu  avaialble to all  some  time  soon in the
future ?
hariharan ramamurthy

Re: [racchabanda] Digest Number 693
1 post by 1 author




Srinivas Kommu 

5/9/02

Other recipients: racch...@yahoogroups.com

I have a question. How do the telugu sites generate content? eenaaDu, sify
etc? I thought they use some kind of OCR. vaarta can manage just with a
scanner, I guess. Or do they type all text in RTS and use a conversion tool?
How does Andhrabharati (http://www.andhrabharati.com) do it? How do they get
it so right with all the arasunnaalu and stuff? If they typed in RTS, I'd
expect at least some typos :) I'm thinking about what would be the best way
if I were to generate my own content.
thanks
srinivas
> Message: 1
> Date: Mon, 6 May 2002 17:47:32 -0000
> From: "V. Chowdary Jampala"
> Subject: Computer Tools for Telugu Language
>
> Techonology Resource Center for Telugu (TRCT)
>
> During my recent visit to India, I had the opportunity to visit the
> Techonology Resource Center for Telugu (TRCT) located at the University of
> Hyderabad and spend some time with Dr. K. Narayana Murthy, an Associate
> Professor in the Department of Computer/ Information Sciences. TRCT is one
of
> several language technology resource centers funded by the Govt of India
with
> the idea of developing and standardizing the technology tools for various
> Indian languages. These centers collaborate with each other in this
endeavor.
> TRCT has several exciting tools in development.
>
> AKSHARA: Multi-lingual editor for Indian languages
>
> Most word processors in Telugu are commercial products. They use
> proprietary keying schemes, and what is typed cannot be stored as 'text',
but
> as code understood only that program. A document typed in one word
processor
> cannot be transported to another; fonts used by one program cannot often
be
> used by another. Even the non-profit and non-commercial products have the
same
> problems. Rachana word processing program was the only program that I used
> which allowed storing a document in ISCII text, but its evolution stopped
with
> DOS only. When I was editing TANA Patrika, this lack of transferability
> between programs was a major source of heartache for me. I now understand
(I
> haven't seen this myself) that Microsoft Windows-XP comes with Telugu
fonts
> and a built-in keyboard driver. I suppose that this would go some distance
> towards standardization and transferability, but not all the way.
>
> In my conversations with Dr. Narayana Murthy, it is quite apparent
that
> he has thought through these issues in detail. His Center has developed
> 'Akshara', an advanced multilingual editor, to deal with these very
problems.
> Akshara is a platform and font independent editor and can deal with large
and
> complex files without problems. It is based on ISCII and supports UNICODE.
It
> supports multiple keyboard layouts and offers an on-screen keyboard as
well.
> The files can be transliterated from Telugu to other Indian languages and
vice
> versa. It includes support for creating interactive web pages in Indian
> languages. (I was told that it uses an XML style Extensible Document
> Definition Language (XDL)- an open representation format making linkage
with
> other tools easy. Some of you techies may understand what that means; I
> don't).
>
> Akshara also comes with a variety of text processing tools (merging,
> sorting etc). Plans for Akshara include dictionaries, spell checkers,
> morphological analyzers etc.
>
> A beta version of Akshara should be available shortly. In the best
news
> of all, TRCT plans to distribute Akshara free once it is ready. (In other
> related good news, I understand that the Telugu fonts and the I-leap
software
> developed by C-DAC are now purchased by the Govt of Andhrapradesh and are
now
> placed in public domain; they can be downloaded from
http://www.andhrapradesh.com/teluguwebsite/).
>
>
> DRISHTI - an OCR for Telugu
>
> Exciting as the news about Akshara is, even more exciting was
Drishti
> Optical Character Recognition (OCR) software for Telugu. Yes, folks; there
is
> an OCR for Telugu in actual development that reads printed Telugu pages!
Dr.
> Atul Negi, a Reader in the CS department, and Dr. Narayana Murthy
demonstrated
> an earlier version of this software to me. The latest version is claimed
to
> have about 95% + accuracy with work continuing on the algorithms to
improve it
> further.
>
>
> VAANI - Text to Speech software
>
> Other projects at TRCT include the development of a Text to Speech
> software Vaani. This is still in the early stages of development.
>
> TRCT plans to link Drishti and Vaani with Akshara. Theoretically,
Akshara
> can read out a printed page to a blind person.
>
> There are several other projects in development including e-mail
software
> and grammatical software. I understand that Dr. G. Umamaheswara Rao from
the
> department of linguistics and his team are actively working on the issue
of
> Computational aspects of language structure. I regret that I could not
meet
> him.
>
> The future for Telugu computing does indeed appear to be very bright!
>
> Regards -- V. Chowdary Jampala
>
>
>
>

--- In racch...@egroups.com, Nasy Sankagiri wrote:
> 1) What is OCR?
Nasy: The reasons for forwarding that message from the Indology list
were different. In 1st line I wanted to point out that there are mss
in Telugu script lying elsewhere, and apparently there are some
attempts going on developing OCR (Optical Character Recognition)
software for Telugu too.
Madan gaaru: At the 1st N.America Telugu lit. meet, Baparao gaaru
mentioned about some ongoing attempts about "OCR for Telugu". Baparao
gaaru, are you there! Also, I remeber hearing from someone that Dr.
Desikachari (of pOtana fonts fame) has been working on such project.
Raama: Are you refering to Mann's (I mean the elder; Thomas):
Zauberberg (Magic Hill in Engli


Prasad A. Chodavarapu 

8/26/01

Other recipients: sar...@yahoogroups.com, racch...@yahoogroups.com

I've finally bought a scanner and finished the experiements necessary to
figure out how to produce quality output that's electronically
distributable, without having to retype in printed telugu material. I
choose a ko.ku. story to experiment with. I don't have permission to
copy, so, you would have to approach me by private email to take a look
at the output. Believe me, the quality of the output is very good.
Here's what I did, in full technical detail.
1. Everything except the scanning process itself can be automated.
Hence, it is important for us to scan at a high quality so as to reuse
this work in future. e.g. someone someday may finally write a Optical
Character Recognition (OCR) tool that can recognize text within the
scanned images.
- I scanned at 300 dpi that's optimal for OCR. lesser would not be
enough and more is actually harmful, even if we are willing to store
larger files.
- I saved the scanned images as TIFFs, an image file format that
supports lossless compression.
- Ofcourse, remember to always scan and save as monchrome images. Our
printed material rarely uses colour and taking advantage of this fact
keeps our file sizes small.
2. What's the best mechanism for distribution? PDF seems to be the
answer. It is a quality solution for both on-screen reading as well as
printing. It helps us package multiple scanned pages in one file.
Acrobat reader is something that's freely available for everyone
everywhere. Most computers have it installed by now. I used free
software from http://www.imagemagick.org/ to convert multiple TIFFs into
one pdf.
3. There's one trick that I had to come up with. The images were roughly
the same size in original except for the first and the last page, where
I omitted scanning white space. PDF viewers and printers scale the image
to make it fit into standard size printer paper. As the scanned images
were of different sizes, the scale factors were different too and hence,
the size of text on the first and the last page was markedly different
from that on the others.
Also, I had to scan without the margins while scanning to get rid of the
black marks at the center that are a result of the book's shape. So, how
do I add back margins?
A simple solution was to first pad all images with white space so as to
bring them all up to one fixed size. This also provides us a way of
keeping margins consistent between pages.
4. Seems like a rule of thumb would be 50KB per page. With a 28.8 kbps
modem, the fastest you can download a 1 page doc is 14 seconds. I am
able to reduce the file size by half if I resample the image at 150 dpi.
The text is still readable, but there is a noticeable change in quality.
cheers
prasad

"అంతర్జాతీయతను సాధించుకోవాలనే తమిళుల పట్టుదలకు ఈ తీర్పువల్ల వచ్చే ఇబ్బంది ఏమీ లేదు. వారి పనిని వారు చేసుకుపోతూనే ఉన్నారు. వారు తమిళాన్ని ప్రాచీన విశిష్ట భాషగానే కాదు, ఒక అధునాతనమైన భాషగా కూడా రూపొందించుకోవడానికి నిరంతరం కృషి చేస్తున్నారు. ప్రపంచీకరణ నేపథ్యంలో ఆంగ్లభాష ఆధిపత్యవాదాన్ని సమర్ధంగా ఎదుర్కొంటూ దానితో సహజీవనం చేయగలుగుతున్నారు. అక్కడ తమిళం గర్వంగా తలెత్తుకొని నిలబడుతూనే, అన్ని రంగాల్లో తన బావుటాను నిలబెట్టుకొంటూనే, ఆంగ్లానికి అవసరమైనంత చోటు ఇస్తున్నది. ఆవిధంగా అక్కడ రెండు భాషల సూత్రం స్థిరపడిపోయింది."
తెలుగుకు క్లాసికల్ హోదా ఇచ్చినా, పనులు కాకపోవడానికి హైకోర్టులో ఉన్న వ్యాజ్యం కారణం అనే ప్రచారం కొందరి బుర్రలకెక్కేసింది. అయితే నిజం ఏమిటంటే ఆ కోర్టుతీర్పునకు లోబడే ఈ హోదానిస్తున్నామని కేంద్రం ఎన్నడో కోర్టుకు స్పష్టం చేసింది. కేంద్ర ఆర్థిక సహాయాన్ని అడ్డుకొనే ఆదేశాలను, కోర్టు ఇవ్వలేదు. కనుక-పరిమితమైన ప్రణాళికతో, నిధులతో కొంత పని గత అయిదేళ్లలో జరిగి వుండేది. మనకు మనం మైసూరులో పరిశోధనా సంస్థ ఏర్పాటును అడ్డుకోకుండా ఉంటే..కాదు కాదు, ఆ సంస్థను నాటి ఆంధ్రప్రదేశ్ రాజధానిలోనో మరో చోటనో ఏర్పాటు చేసుకొని ఉంటే, ఇప్పటికే ఎంతో పని జరిగి ఉండేది.
అయితే ఇదంతా నాటి ఆంధ్రప్రదేశ్ ప్రభు త్వం వారి పట్టనితనం వల్ల, చేతగాని తనం వల్ల జరిగిపోయింది. ఆ సంస్థకోసం కేవలం ఒక పెద్ద భవనం, మరికొన్ని ఎకరాల స్థలం రాష్ట్ర ప్రభుత్వం ఇవ్వాలి. అంతకుమించి ఒక్కపైసా అయినా రాష్ట్ర ప్రభుత్వం ఇవ్వవలసిన అవసరం లేదు. నియామకాలు, నిర్వహణ అంతా మైసూరులోని సిఐఐఎల్ వారిదే. ఇందులో ఎవ్వరికీ జోక్యంచేసుకునే వీలు లేదు.ముగ్గురు ముఖ్యమంత్రులు మారారు. రాష్ట్రం రెండు రాష్ట్రాలైంది. కొంత స్థలాన్ని, ఒక భవనాన్ని ఇందుకోసం ఇవ్వడం చేతగాని ప్రభుత్వాలు మనల్ని పరిపాలిస్తున్నాయి. దీనికి ఎవరిది బాధ్యత?
- డా॥ సామల రమేష్‌బాబు
తెలుగు భాషోద్యమ సమాఖ్య అధ్యక్షుడు

Re: Status of Telugu OCR.

So far I have found this online OCR website. to be the most accurate ( which is not accurate enough to work for us ) ...
9/6/13 by me - 5 posts by 2 authors 57 views

Telugu OCR.

నమస్కారం. I want to introduce to you the Telugu OCR framwork using " banti". Those of you inclined, interested and patient, can try it out.
10/9/15 by రాకేశ్వర రావు - 2 posts by 2 authors 9 views

telugu conjuncts and OCR.

Once again coming back to the possible combinations of letters and the need to train all of them for a good telugu OCR.
2/23/10 by me - 5 posts by 2 authors 7 views

telugu OCR out of box ! out andof the box thinking Or just a boxing problem ?

I am new to this forum, and also a non- computer non-programming, person, who is somewhat conversant MR how programs are written.
10/23/09 by me - 3 posts by 2 authors 4 views

[సంగణన] eudcedit application (hidden) in windows OS ( does it help in making OCR data)

I may be excused for using English to communicate here.. a 'Private Character Editor' exists in windows (I checked in XP and Vista). it is named ...
5/20/10 by ranjani - 2 posts by 2 authors 1 view

Re: OCR Meet tomorrow (Sunday 02 August 2009) in Hyd.

So what happened at the OCR meet ? interested in details On Aug 5, 1:48 am, రాకేశ్వర రావు wrote: ...
8/31/09 by me - 6 posts by 4 authors 4 views

OCR accuracy and claims of various organisations.

మళ్లి ఒక సారి నా తెలుగు టైపింగ్ ఇంకా కుంటి నడక నదుస్తున్నదున క్షంతవ్యుడిని.
10/23/09 by me - 1 post by 1 author 2 views

Tesseract OCR for telugu. Error in retreiving OCRed text.

Hello sir,. Firstly ,thank me for your excellent contribution for indic-telugu language in tesseract. I do feel that the achievement of accuracy ...
1/11/13 by virinchy p - 1 post by 1 author 26 views

Re: Telugu ocr.

it was a mistake , it was one of the files I had gathered using the webcorpus . when i run a query it gives me anywhere between 10 to 200 website ...
9/2/09 by me - 1 post by 1 author 2 views

OCR spell check.

మళ్లి ఒక సారి నా తెలుగు టైపింగ్ ఇంకా కుంటి నడక నదుస్తున్నదున క్షంతవ్యుడిని ...
10/23/09 by me - 1 post by 1 author 0 views

[సంగణన] Re: Tesseract OCR.

Regards > > Praveen http://groups.google.com/group/telugu-computing/msg/ b0bd9c74bb92974e నుండి తెచ్చుకొనమన్న long32.tiff ...
5/24/10 by రాకేశ్వర రావు - 2 posts by 1 author 7 views

why a separate spell check for OCR.

http://www.isri.unlv.edu/publications/isripub/Taghva00b.pdf In the above paper . there are some interesting arguments on how a spell checker for ...
10/23/09 by me - 1 post by 1 author 2 views

Re: ఏదైనా ఐడియాలు.

regarding making alist of telugu word and creating a corpus/database to help with OCR and other computational linguistics for Telugu.
9/6/13 by me - 17 posts by 9 authors 7 views

[సంగణన] Tesseract OCR for Telugu.

Hi, Telugu OCR ని ఎలా తయారు చేయడం పరిగెత్తించడం అనేది నేను వివరముగా బ్లాగు వ్రాసాను।
5/19/10 by రాకేశ్వర రావు - 6 posts by 4 authors 13 views

Training Data needed for OCR.

తెలుఁగు OCR కి ట్రెయినింగు ఇవ్వడానికి మాంచి real-world data కావాలి।
7/21/12 by రాకేశ్వర రావు - 9 posts by 3 authors 61 views

Re: తెలుగు ఓసిరావే !

1. టెస్సెరాక్ట్ కి సంబంధించిన ఏమిటి , ఎలా ... మొదలైన వివరణలని సంబంధిత వికీ పేజీలలో ...
6/22/09 by mv - 5 posts by 3 authors 6 views

Re: [సంగణన] Re: ఆఱేండ్ల తరువాఁత.

Rakesh gAru,. Please let me know what kind of help you are looking at. I can help in making as distributable package for Windows and a building ...
5/7/15 by Dileep. M - 7 posts by 6 authors 31 views

Re: tesseract కోఱకున్ కైఁ.

I am a newbie to this forum let me introduce my self I am a medical physiciasn born in warangal studied and worked in hyderabad and delhi and ...
8/30/09 by me - 6 posts by 4 authors 36 views

పోతన ఖతికి మార్పులు కావలెను.

తెలుగు గీకువీరులకు నమస్కారములు। ఉగాది శుభాకాంక్షలు। నాకు పోతనలో చిన్న మార్పు ...
3/16/10 by రాకేశ్వర రావు - 2 posts by 1 author 5 views

Re: ఆన్లైను ఓసిఆర్ - తెలుగు, కన్నడ, హిందీ, తమిళం, మలయాళం ...

ర్టీఉరేఖదిరనాఅ నిథోఢరాడుపోసీఠాణ ముడిధ ల్రిఅ ల నౌ ( this is because of unicode table making rep in to a ...
6/9/12 by me - 7 posts by 2 authors 24 views

Re: యూనీకోడు డంపు కావలెను.

Makes sense. I have not finalized the details as of now. The easiest way is to do as you said. But I am thinking of consulting an expert on ...
2/7/13 by రాకేశ్వర రావు - 17 posts by 4 authors 31 views

Re: Problems with Drishti.

దాన్ని లినక్సులో కంపైల్ చెయ్యటమే గగనం. ఒకట్రెండు సార్లు ఉబుంటూలో కంపైల్ ...
9/19/09 by తెలుగువీర - 4 posts by 3 authors 3 views

your PEARL on BROWN dictionary.

as you can see the font is showing up in junk symbols . I don't think my excel is at fault as you can see the other words I cut and pasted are ...
9/10/09 by me - 2 posts by 2 authors 0 views

Re: [సంగణన] Re: ఒక సలహా ఇవ్వండి.

I'm not aware of how to get this data form HCU. But we can make ours. I made an attempt to build a corpus for a spell checker.
12/29/14 by Dileep. M - 6 posts by 4 authors 41 views

మఱికాస్త ఊరింత.

నేను ఇవాళ ఊరకునే మొల్ల రామాయణంలో కొన్ని పేజీలు టైపాటు చేసిపెట్టాను ...
3/2/10 by రాకేశ్వర రావు - 1 post by 1 author 3 views

Re: తెలుగు ఓసీఆర్ ట్రైనింగు కోసం tiff ఫైళ్ళు.

14231 telugu books with 2846200 pages in DLI ఎదమ పక్క ఉన్న శొధన పట్టిక లో తెలుగు ఎన్నుకొని ఇన్ని ...
9/10/09 by me - 11 posts by 5 authors 12 views

Re: తెలుగు పదాల జాబితా కావాలి.

Google Groups will no longer be supporting the Pages and Files features. Starting January 13, you won't be able to upload new content, ...
3/5/11 by me - 8 posts by 4 authors 5 views

[సంగణన] Re: CVTEMeghna , CVTEHarsha Telugu fonts.

well, I do not think that they are unicode fonts. we have manually re- typed a set of Koumudi modati cinema articles for Navatarangam. just trying ...
5/21/10 by ranjani - 5 posts by 2 authors 3 views

Re: [సంగణన] Re: డిటిపి సహాయం కావాలి.

రాకేశ్ గారు, వీరు ఎవరో సెంట్రల్ యునివర్సిటీ వారు ఈనాడు corpus ఒకటి సంపాదించారు.
9/11/09 by Aasish Pappu - 16 posts by 4 authors 19 views

Qt Anyone?

అందరికీ నమస్కారం, శ్రద్ధగా వినండి। తెలుగు OCR training కి Owlboxer అనే ఒక సాఫ్టువేరు చాలా ...
3/15/10 by రాకేశ్వర రావు - 4 posts by 2 authors 1 view

Tesseract 3.0 Tessdata files.

నమస్కారములు, నేను ఈ మధ్యనే ఉబుంటూకు మారి, టెస్సరాక్టు 3 కు తెలుఁగు ...
12/21/10 by రాకేశ్వర రావు - 4 posts by 2 authors 14 views

Re: తెలుగు సంగణన లొ తెలుగు ఓసిఆర్ /ఒక తెలుగు రోబోట్.

Dear Dr. Hariharan,. I would like to know if you have any knowledge of a professional Telugu OCR software available in market.
1/7/14 by SAI CHAITANYA - 2 posts by 2 authors 20 views


No comments: