It is my pleasure to announce that the user of the month for December 2012 is Kurt LaButti from the Department of Energy's Joint Genome Institute.
Kurt has been in the genomics industry since 2001. He started at what is now the Broad Institute of Harvard and MIT working on the human genome. He performed closure on chromosomes 17 and 11. He then transitioned into bioinformatics and assembly analysis, working primarily on fungi and virus genomes. At the Joint Genome Institute, he assembles and analyzes data from the DOE community, sequencing relevant organisms. His group handles microbes, metagenomes, fungi, and other interesting projects.
Kurt has moved more than 18 TBs of data in less than a month among NERSC and Joint Genome Institute data transfer nodes, his Globus Connect endpoint, and the NERSC High Performance Storage System (HPSS). He has gotten data transfer rates as high as 3.7 Gbps. It is interesting to note that he got this rate of 3.7 Gbps for transfers to HPSS, a hierarchical storage system with tape backend. NERSC provides a GridFTP interface to their HPSS using the GridFTP server Data Storage Interface (DSI) module for HPSS.
Kurt's dataset ranges from very small files (as small as few kilobytes) to huge files (as large as a terabyte). Occasionally, his transfers ran into some transient failures and sometimes, failures that require manual intervention—for example, permission denied and disk quota exceeded errors. In all the circumstances, Globus Online automatically resumed the transfers as soon as the problem was fixed.
"Technology has been rapidly evolving since I started. I've been around long enough to see the sequencing data evolve from Sanger and capillary sequencing, to 454 pyrosequencing, to what we typically use today, Illumina and single molecule real time PacBio! As the methods and output data evolved, so did the speed and amount of data produced from these fantastic machines," says Kurt.
"In the past you might have thought getting enough data was an issue,” he adds. “These days just dealing with the amount of data produced is one of the biggest issues. These machines produce enormous amounts of data. In my graduate school days KB was the unit of measure we most used in lab. Today it's GB or TB! Moving, storing, analyzing, and dealing with all of this data can be a real issue. Some regularly mail DVDs of data around the country because it's faster than ftp and other in silico methods. I use Globus Online to transfer data, output, reports, etc., all over the place! It makes it unbelievably convenient and easy to do so and am very happy with it."
Congratulation Kurt and thanks for your continued use of Globus Online!