Friday, November 6, 2009

Effect of removal of select punctuation marks on machine-translation

The full stop is a quite unappreciated/under-appreciated component of the written language. A thought came in my mind some weeks back - will Google Translate be able to correctly translate a paragraph of text in English, if I remove all the commas and full-stops?

I strongly expected Google Transate to get confused in such a scenario, returning bizarre results. Today I took a paragraph of text from a news story and performed a simple experiment on it, to determine the impact of removal of select punctuation marks on the translation capabilities of Google Translate.

Original English text, with punctuation: It's impossible to run a company these days without an investment in technology, which can take your operations to another level. But how do you do it economically and without wasting extra cash on needless tech services or products? 

Translation into Hindi: यह प्रौद्योगिकी के क्षेत्र में निवेश के बिना, जो एक दूसरे स्तर पर अपने कार्य ले जा सकते हैं एक कंपनी इन दिनों चलाना नामुमकिन है. लेकिन यह कैसे आप इसे आर्थिक करना और अनावश्यक तकनीक सेवाओं या उत्पादों पर अतिरिक्त नकदी बर्बाद कर के बिना?

English text without commas and full-stops: It's impossible to run a company these days without an investment in technology which can take your operations to another level But how do you do it economically and without wasting extra cash on needless tech services or products 

Translation into Hindi: यह प्रौद्योगिकी के क्षेत्र में एक निवेश है जो दूसरे स्तर तक अपने अभियान ले लेकिन कैसे आप इसे आर्थिक करना और अनावश्यक तकनीक सेवाओं या उत्पादों पर अतिरिक्त नकदी बर्बाद कर के बिना कर सकते हैं बिना किसी कंपनी इन दिनों चलाना नामुमकिन है

Anyone who knows how to read the Hindi language can make out that Google Translate does in fact go haywire when there are no commas or full-stops.