BharateeyaOO.o: Enabling OpenOffice.org for India
2007-03-06
The BharateeyaOO.o team. From left to right: Garla Krishna Rao, Srinivas NK, Laxminarayana A, Praveen Reddy, Sonali Patil, RKVS Raman, Nobby Varghese and Mamatha Achuthan
It's been over five years since a little known group at NCST Bangalore, now CDAC, began work on translating OpenOffice.org into Indian languages. The BharateeyaOO.o Project, along with its team, have come a long way in these years. In this interview with Mr. RKVS Raman, Senior Staff Scientist at CDAC Bangalore, and Project Coordinator, BharateeyaOO.o, I try to trace the evolution of the project. I am not surprised when Mr. Raman reiterates the cycle that many government sponsored groups in several countries go through at one time or the other. Breaking the monotony of traditional behind-closed-doors development, and understanding the benefits of community development, isn't easy. But as Mr. Raman illustrates, the benefits are handsome.
Mayank Sharma: Let's start off with some background information on the group. How did the BharateeyaOO.o project came to be? What hole did it plug?
RKVS Raman: BharateeyaOO.o project was launched somewhere in June of 2001. BharateeyaOO.o began life a sister project to work being done in CDAC Mumbai (previously known as NCST) on INDIX. We started of with the mandate to enable editing and rendering of Indian Scripts in OpenOffice.org. Things like Complex Text Layout support, typing, selection, caret movement, backspace/delete, text break, mouse events were all broken at that time. The team at that time consisted of three people Ms. Shikha Pillai, Mr. Bhupesh Koli and Mr. N Velmani.
We were able to solve many of the issues but unfortunately none of these changes that we submitted upstream are reflected in today's builds because OpenOffice.org migrated to ICU as its layout engine in its later versions. But yes that was the way we started our contribution to OOo. Once we knew how to build OpenOffice.org, which itself was an achievement at that time, the next logical step was to attempt at localization of the build to Indian Languages. We took up 2 major Indian Languages, Hindi and Tamil. Translations of these languages were done by our technical team and not translators, and it took us nearly 9 months to complete the translations. The localization modules were also not perfect for OpenOffice.org 1.0. Those were really challenging times because we were doing new things with no prior expertise in these areas anywhere in India. Mailing lists were the only source of information and even there it used to take some effort to try and make people understand about the specific problems that we faced with Indic scripts.
Amazing. You have come a long way. How many people contribute to BharateeyaOO.o today?
Today it is hard to estimate how many people make up the BharateeyaOO.o team. BharateeyaOO.o is now a community which is not just limited to a group of people at CDAC Bangalore. Right from Ms. Shikha, Mr. Vijay Kumar, Mr. Deepu Abraham who are no longer associated with the organization, everybody who has been part of the team still remain part of this team and help us out whenever they find time. At CDAC Bangalore, the team consists of 8 members: Garla Krishna Rao, Laxminarayan A, Mamatha Achuthan, Nobby Varghese, Praveen Reddy, Sonali Patil, Srinivas NK, and RKVS Raman.
Currently only a couple of us are actually involved with the builds of OpenOffice.org while others have taken up allied areas of language enhancements to OpenOffice.org and open source tools at large. This includes working on accessibility modules, spell checkers, dictionaries, collaboration frameworks and rendering issues.
Today each member of the team works towards making OpenOffice.org a world-class product through her/his area of research interest.
Apart from us at CDAC, BharateeyaOO.o is supported by localization teams all over India, like the IndLinux group, Utkarsh, Punlinux and of course the GIST Group at CDAC Pune. BharateeyaOO.o is no longer just the name of our group. It has become more of a brand name for OpenOffice in India. If OpenOffice is in Indian Languages it is BharateeyaOO.o :-) . Today we also aim to support groups from Sri Lanka and Pakistan for localizing OOo to their locales and languages.
You've covered a lot of ground in these years. Could you briefly mention the high-points in this duration? Any major milestones that you set out to achieve and have accomplished?
BharateeyaOO.o team has been blessed with perpetual high-points in its span of more than 5 years of activity now. The first high-point undoubtedly was the release of OpenOffice.org 1.0 in Hindi. Thats when we launched our website with the BharateeyaOO.o name. The name caught on slowly in the open source community in India. Since then we haven't looked back.
We moved from Hindi to Tamil and from version 1.0 to 1.1. The second major victory for us came when a doctor from a primary health center in rural area of Maharashtra, Dr. Swapnil Lale came forward to provide translations for Marathi language. That was when we felt really on top because we had succeeded in getting our first community contribution.
BharateeyaOO.o was started with the aim to provide OpenOffice.org in all Indian Languages. After Hindi, Tamil and Marathi, things became a bit quite for a couple of months since we were not getting resources for new languages. That was the time when the team size also depleted and we were reduced to just a couple of full time people, virtually all freshers, working on small modules. We invested our time in building the first speech plugin for OpenOffice.org 1.1 Writer and also built a Transliterator plugin.
The big break came in the form of an opportunity that I got to meet our IT Minister, Mr. Daya Nidhi Maran during one of his visits to CDAC. That was the time I got to show him the capabilities of OOo. He was pretty much impressed and asked us if we could include it in the Free Language CDs that he planned to release for Tamil. Instantly we recognized the opportunity we had and grabbed it with both hands.
The weeks that followed, were quite interesting. That was the first time my 'research team' went into 'production work' like packaging, installers, testing, bundling and writing installation instructions. We had never ever done that before in our career. What we also managed to do was package Linux versions of several applications into the CDs. This was perhaps one major move that succeeded in many ways. Those in power and in position, for the first time got to realize that applications were available on par on both proprietary and free platforms. This has in a way helped influence some policy decisions towards open source and open standards.
The Tamil CDs launch and subsequently the Hindi CDs launch put the BharateeyaOO.o group in a different orbit. The somewhat lesser known localization group called BharateeyaOO.o became a blue-eyed boy for the FOSS media. We got all sorts of adulations. TV Shows, Meeting with President of India, hitting newspaper front page, we've seen it all.
But more than this hoopla, the main objective of the team, to bring OpenOffice to all Indian languages, is today getting realized. We still have some way to go, but yes, we are on the path.
You mention that several projects contribute to BharateeyaOO.o. But has the situation been rosy all along? I mean, have people from the open source hacking community always come forward to help, despite the BharateeyaOO.o Project being run by a government institution?
BharateeyaOO.o group was born with the understanding that it will be working with open-source technologies. We at NCST (then), were in no way constrained to think about any commercial angle to the research activities that we undertook. But perhaps that was one of the first instances when some "Indian Government" research group was actively trying to make contributions to open-source code. It was a learning process for us as well. Submitting bugs and patches back on to OpenOffice.org community site was all new to us since we were trained to work in small isolated groups before that. We never looked at making a community presence as one of the objectives then. What we did was put all our source and work on our site and thought that our job was done.
But it was not so. And naturally we were looked down by the Indian open source localization community on more than one occasion. I still remember the flak that we drew because of our naïvity. Shikha understood the situation and took efforts to reach out to the community. That was the way we bridged some of the gap.
As far as the group was concerned, we never had any issues about talking publicly about our work. In fact, we were encouraged to do so. It was only a culture change that was necessary. We needed to get out of our cocoons--researchers are a pretty reserved lot--and talk to people, involve in discussions over IRC and mailing lists and make noise about things that we were doing. It was this culture that was absent previously.
Once we developed that habit of speaking to people, we realized that the localization groups were more than eager to help us grow. I already told you about Dr. Swapnil Lale's contribution. Utkarsh was another group which chipped in with Gujarati translations. We also helped out Punlinux group and the Free Software Foundation's Andhra Pradesh chapter helped us with Telugu.
We have today assumed the role of a convenient meeting point for OOo activities in India. Mr. Vijay Kumar who was part of our team, is the Indic Native Language Group Coordinator, which supports new localization groups that attempt to localize OpenOffice.org. Still there might be some activities on OOo that might be going in India without our knowledge but by and large we have merged ourselves well into the FOSS community. Today we fashion ourselves as part of that large FOSS community when it comes to OOo rather than as part of CDAC.
You touch upon a few localization issues. For people who are not aware, could you discuss some of the issues involved with computing in localized languages? Has the situation improved since the time you started out?
The issues in computing in local languages are all well known now. The issues about encoding, fonts, rendering, input methods and standardization of terminologies is all well known. I don't want to rehash on the issues per se.
But what I would definitely like to give is a big torch of hope. Things are definitely looking up. We have adopted Unicode for our encoding purposes. There are minor glitches but things are moving forward nevertheless. OpenType fonts are being made available through Free Language CDs and it is only a matter of time before the contributors give up their final control over it and make it GPL. So fonts are available if needed. Rendering issues are also more or less getting solved with newer versions of International Components for Unicode (ICU) and Pango. We at BharateeyaOO.o also work towards making ICU perfect in its handling of Indic scripts. I am looking at a window of a maximum of about 6-9 months for solving all rendering issues in Linux and OOo. The FOSS community are working on the Smart Common Input Method (SCIM) platform and other input frameworks and input methods are getting quite stable on Linux. They were always stable on Windows.
Terminologies are still an area of debate. And I personally believe it will take some time for people to come to terms with the way we address various things in software in our languages. Central government and some state language bodies are working towards standardization of terminologies. So at this stage the strategy that we have adopted is to give people something to start with. We don't guarantee that the translations are accurate and unambiguous but we are sure that these translations will definitely be talked about, sometimes appreciated and sometimes criticized. We are determined to pick up threads from there and they'll be reflected in our future versions.
In fact Mr. Mahesh Kulkarni, the head of GIST group at CDAC Pune is of the opinion that even the translations could be individual-specific and it should be easy enough for the end-user to change the terms used as per her/his preferences. So we do have some way to go in that area. All is not hunky-dory yet.
So what's the current status of the project? How many languages are supported? Have all changes been submitted upstream as well?
Today as it stands, BharateeyaOO.o has completed localization of OOo in about 15 languages in its labs, of which about 5 are community supported. These languages are, Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Marathi, Mythili, Oriya, Punjabi, Tamil, Telugu, and Urdu.
Of these about 10 languages have already been submitted upstream as patches. OOo 2.1 has support for about 10 languages in its source. Our aim is to complete OOo in all 22 scheduled Indian languages by December 2007 and we are bang on schedule.
That's great. From what I know, you are not limited to just localizing OpenOffice.org. Could you briefly list the other projects the group is involved in?
The BharateeyaOO.o team works on "Language Technologies" and OpenOffice.org happens to be one of the products which heavily influences our work and gives us direction. We are looking at Indic Language enabling the operating system on the whole and so we work in all areas concerned.
One of the other main area in which the group spends its time is Speech technology. We have been able to come up with pretty usable ideas in that area and have already made the binary versions available for some time now. We are waiting for approval from powers-that-be to GPL it. Expect some important announcements soon :-) .
Apart from OOo, we've localized Firefox, Thunderbird and Gaim as well. In fact it was this combination of localized open-source tools that has impressed people. Localized versions of OOo would have definitely made an impact but to complete the picture, we needed to take up these tools too so that people do start believing that open source has some really serious tools for end users. All these tools have been localized to 10 Indian Languages and we expect to upstream them by this month end.
The other areas in which we actively research are ollaborative Frameworks, Gesture Based Input Methods, Handwriting Recognition, Semantic Web, Social Computing, and Rendering Engines.
Each person in my team works in a different research area and OpenOffice.org is perhaps one common interest for all of us, where we plow in our findings to help make OOo a world-class product.
Fantastic. I'd like to close this interview with your short-term as well as long-term goals?
Our short term goals are simple and long term are very ambitious.
Short term goals include complete localization of OOo, Firefox, Thunderbird and Gaim and release them for public feedback; solve all Indic script rendering issues on the Linux platform; create user-friendly input methods for Indic scripts; provide basic accessibility support in all Indian Languages, and help the NRCFOSS centre with Indic related issues in their BOSS Linux distribution.
Long term goals involve studying the cultural aspects of Indic computing; create sturdy speech recognition engines for Indian languages; continuously create/explore avenues for wide adoption of FOSS in various sectors by active promotion and support, and to become an instrument in the proliferation of ICT in rural and underprivileged areas.
Thank you for your time, Mr. Raman.
Return to Articles