|
Genome Sequence Compression with Distributed Source Coding Host Publication: Data Compression Conference (DCC), 2013 Authors: S. Wang, X. Jiang, L. Cui, W. Dai, N. Deligiannis, P. Li, H. Xiong, S. Cheng and L. Ohno-Machado Publisher: IEEE Computer Society Press Publication Year: 2013 Number of Pages: 1
Abstract: Genome data is playing a significantly important role in modern medicine, e.g., personalized medicine and earlier detection of diseases. With the increasing demand of genome sequences data, many advanced sequencing techniques has been developed, among which the flexible miniaturized sequencing devices are becoming more popular and promising, due to their advantages of portability and efficiency. These portable devices commonly have low processing capabilities, constrained power budget, small storage/memory, and limited communication bandwidth. In contrast, genome data is usually large in size and highly redundant within or across sequences. Therefore, genome data should be compressed before being transmitted. Unfortunately, traditional genome compression techniques (e.g., both reference free and reference-based architecture) fail to fit with aforementioned miniaturized devices, due to the facts of high computational complexity, large space and memory requirements for storing references at the encoder side. In this paper, we develop a novel genome compression framework based on distributed source coding (DSC), which is specially tailored for the need of miniaturized devices. At the encoder side, subsequences with adaptive code length can be compressed flexibly through either low complexity DSC based syndrome coding or hash coding with/without the presences of variations between source and reference based on the decoder feedback, respectively. Moreover, to tackle the variations between source and reference at the decoder, we carefully designed a factor graph based low-density parity-check (LDPC) decoder, which can handle the difficulties of detecting insertion, deletion and substitution. Experimental results demonstrated that our proposed approach achieves a competitive compression performance compared to the benchmark compressor GRS, but with a much low encoding complexity.
|
|