These are some of the software projects I've worked on. For their application to biological problems, please see my research interests or my full curriculum vitae.
- Ongoing
- Since August 2011: Map of Life
-
Details available on the Map of Life website.
- Completed
- 2007 - 2011: OCR Terminal
-
OCR Terminal is an online optical character recognition (OCR) service: it reads text from uploaded images and provides the image files in an editable format such as Microsoft Word, Adobe PDF or plain text. Since mid-2008, when the service first went public, tens of thousands of user accounts have been created and over 100,000 documents have been processed on this website. Apart from the website itself, the service features a simple API which can be used to submit documents for processing programmatically.
I have been lead developer of OCR Terminal right from project inception. I have written all of OCR Terminal's underlying code, first as a Perl/CGI application, and later as a Perl/Catalyst application. I am particularly proud of designing the public API, which is used by our own desktop client, several in-house tools, and several clients of ours who use it for both bulk processing and as a backend for their own software.
I also manage OCR Terminal's main server administrator, responsible for maintaining all the servers and backend components. I was able to learn about server monitoring with tools such as Munin. Since early 2009, OCR Terminal has been hosted on the Amazon EC2 processing cloud, giving me experience with setting up, bundling and managing EC2 instances.
- 2010: NameCards Terminal
-
NameCards Terminal provides OCR for business cards. My main contribution to this project was as an integrator, merging changes from different developers to provide a final product. This was the first time that I worked on a project where I wasn't lead developer, so it gave me valuable information in working with other people's code.
Note that this service is only available for customers of SingTel's MyBusiness SaaS Platform.
- 2006 - 2010: Sequence Matrix
-
Sequence Matrix arose from the need to assemble large datasets, to easily eliminate individual suspicious sequences from a complex dataset, and to generate custom taxonsets and codonsets to allow parts of an incomplete dataset to be analysed separately.
You can find out more about the problems we tried to solve with Sequence Matrix at my research page. On the technical side, Sequence Matrix is based on the Species Identifier codebase (together referred to as TaxonDNA). It utilized a Model-View-Controller (MVC) architecture to separate file loading and saving from displaying the sequence information in the table. Using a common codebase allowed new features from Sequence Matrix (such as improved NEXUS and Mega file format support) to be ported back into Species Identifier easily.
- Homepage
- Source code (on GitHub)
- Vaidya, G., Lohman, D. J., Meier, R. 2010 Sequencematrix: concatenation software for the fast assembly of multi-gene datasets with character set and codon information. Cladistics, accepted. Available online.
- My presentation describing Sequence Matrix's features.
- 2005 - 2006: Species Identifier
-
Species Identifier is a program I wrote to test DNA barcoding by using sequences downloaded from NCBI GenBank. It's a collection of individual tools united in a common user interface which allows them to process the same dataset in different ways. For more details on the research generated by Species Identifier, see my research page.
Technically, Species Identifier is a simple but well-structured Java application which interacts with users using the Java AWT. Each module is a separate Java class, which has a common interface with which it can be registered by the main application, and through which it can obtain access to the currently loaded dataset. Access to the dataset is regulated so that only a single module can access the dataset at a single time, avoiding problems of simultaneous editing of the data by multiple modules. The modular structure of the program allowed me to add or remove modules easily, customizing the release for the task at hand and adding new "debugging" modules not intended for general release.
- Homepage
- Source code (on GitHub)
- Meier, R., Shiyang, K., Vaidya, G., Ng, P. K. 2006 DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Systematic Biology 55 (5), 715-728. Available online.
- 2002 - 2006: SPS
-
The Special Programme in Science was a central part of my university life. While it was largely a programme to encourage research interests, I volunteered to help manage SPS's computer infrastructure for a year as one of the two SPS system administrators. My colleague and I were trained by our seniors in administering Debian Linux, our Linux distribution of choice. Apart from the usual system administration tasks, we also set up new workstations and redesigning the network layout to improve security. We also trained the next generation of system administrators.
While I worked as lab officer at NUS, SPS's web administrator succession broke down. With design help from my seniors, I recreated the entire website from scratch and established the design it currently has (as of July, 2010). I also trained the next generation of SPS web administrators.


Comments