Title: Linux for Bioinformatics: Dedicated Distributions for Processing of Biological Data - Part 2: Package Repositories and Complete Systems
Abstract: The number of active, widely used and valuable bioinformatics projects at open source software repositories such as bioinformatics.org and sourceforge.net is constantly increasing. Besides historical tools used for the analysis of biological data before Linux became a viable option on the desktop, new projects are started and new tools are being made available to improve the ability to collect, analyse and integrate large collections of data. The development of new or improved algorithms for the analysis of genomic data are fostering the development of new tools. On the other hand, the availability of open source tools with the full access to algorithms and source code and the possibility to modify and improve them, is encouraging a sort of good scientific practice in providing new tools and promotes reproducible research [1]. Important efforts in the past years have been dedicated to making access to data easier and facilitate their analysis improving our knowledge of biology. These efforts have been targeted, for example, at harmonization through the definition of ontologies [24], at integrating databases [23], at devising mechanisms to overcome the typical pattern of creating manual ad-hoc connections among software tools and databases, cutting and pasting queries, creating temporary files and taking notes by providing workflow mechanisms and single access point portals which link smoothly the actions and facilitate reproducible research. [3] [5] [7] [8] [20]. The availability of these ¿enabling¿ tools allows biologists to focus on their research without being distracted by computer problems. In parallel to these efforts, the availability of easy to use live distribution of the GNU/Linux operating system, has facilitated the development of ready made desktop (and server based) solutions which collect in one single CD or DVD an entire operating system equipped with tools for bioinformatics analysis as well as with development environments enabling the further development of new tools. This paper is the first of a two parts survey which reviews the most popular solutions available in the Linux arena providing a desktop environment equipped with applications and development libraries for life scientists.
