0%

SRA(NCBI) stores all the sequencing run as single “sra” or “lite.sra” file. You may want separate files if you want to use the data from paired-end sequencing. When I run SRA toolkit’s “fastq-dump” utility on paired-end sequencing SRA files, sometimes I get only one files where all the mate-pairs are stored in one file rather than two or three files. The solution for the problem is to always run fastq-dump with “-split-3” option. If the experiment is single-end sequencing, only one fastq file will be generated. If it is paired-end sequencing, there may be two or three fastq files. Two files (with suffix “1” and “2”) are matched mate-pair read file where as the third one (without any suffix) contains all the reads that do not have any mate-paires (or SRA couldn’t resolve mate-paires for them).

Hope my experiences with NCBI SRA data handling help the readership.

source:

SRA toolkit document:

An example command:

./fastq-dump.2 -split-3 ~/Desktop/ERR068552.sra -O ~/Desktop/temp.fastaq/

PS: 如果测序使用的是ligation,结果为2 base encoding的color-space reads,可以加入-B选项使得fastq中的序列为base space reads

PCR duplicates are an everyday annoyance in sequencing. You spend hundreds or thousands of dollars to get sequencing done, and after you get the reads back, you find that several percent, sometimes even 30% or 70% of your reads are identical copies of each other. These are called PCR duplicates and most sequencing pipelines recommend removing them or at least marking them (Picard’s MarkDuplicates or samtools rmdup are two available tools).

Read more »

1998年,Andrew Fire和Craig Mello提出了一项新技术:通过dsRNA诱导特异基因的沉默,即所谓RNAi。2000年,Amy Pasquinelli等将lin-4和let-7作小时序RNAs(stRNAs,mall temporal RNAs)。

RNA干涉(RNAi)在实验室中是一种强大的实验工具,利用具有同源性的双链RNA(dsRNA)诱导序列特异的目标基因的沉寂,迅速阻断基因活性。SiRNA在RNA沉寂通道中起中心作用,是对特定信使RNA(mRNA)进行降解的指导要素。siRNA是RNAi途径中的中间产物,是RNAi发挥效应所必需的因子。SiRNA的形成主要由Dicer和Rde-1调控完成。由于RNA 病毒入侵、转座子转录、基因组中反向重复序列转录等原因,细胞中出现了dsRNA,Rde-1(RNAi缺陷基因-1)编码的蛋白质识别外源dsRNA,当dsRNA达到一定量的时候,Rde-1引导dsRNA与Rde-1编码的Dicer(Dicer是一种RNaseIII 活性核酸内切酶,具有四个结构域:Argonaute家族的PAZ结构域,III型RNA酶活性区域,dsRNA结合区域以及DEAH/DEXHRNA解旋酶活性区)结合,形成酶-dsRNA复合体。在Dicer酶的作用下,细胞中的单链靶mRNA(与dsRNA具有同源序列)与dsRNA的正义链互换,原来dsRNA中的正义链被mRNA代替而从酶-dsRNA复合物中释放出来,然后,在ATP的参与下,细胞中存在的一种RNA诱导的沉默复合体RNA-induced silencing complex (RISC,由核酸内切酶、核酸外切酶、解旋酶等构成,作用是对靶mRNA进行识别和切割)利用结合在其上的核酸内切酶的活性来切割dsRNA上处于原来正义链位置的靶mRNA分子中与dsRNA反义链互补的区域,形成21-23nt的dsRNA小片段,这些小片段即为siRNA。RNAi干涉的关键步骤是组装RISC和合成介导特异性反应的siRNA蛋白。SiRNA并入RISC中,然后与靶标基因编码区或UTR区完全配对,降解靶标基因,因此说siRNA只降解与其序列互补配对的mRNA。其调控的机制是通过互补配对而沉默相应靶位基因的表达,所以是一种典型的负调控机制。siRNA识别靶序列是有高度特异性的,因为降解首先在相对于siRNA来说的中央位置发生,所以这些中央的碱基位点就显得极为重要,一旦发生错配就会严重抑制RNAi的效应,相对而言,3′末端的核苷酸序列并不要求与靶mRNA完全匹配。

Read more »

/声明的时候初始化      
HashMap hashMap = new HashMap(){
{
      put("a", 1);
      put("b", 3);
      put("c", 2);
       }
}; 
//sorted by value
ArrayList l = new ArrayList(hashMap.entrySet());    
Collections.sort(l, new Comparator() {    
     public int compare(Map.Entry o1, Map.Entry o2) {    
          return o1.getValue().compareTo(o2.getValue())  ;  
      }    
});  
for(Map.Entry e : l) {  
      System.out.println(e.getKey() + "::::" + e.getValue());  
}

//sorted by key
l = new ArrayList(hashMap.entrySet());    
Collections.sort(l, new Comparator() {    
     public int compare(Map.Entry o1, Map.Entry o2) {    
         return o1.getKey().compareTo(o2.getKey());    
    }    
});  
for(Map.Entry e : l) {  
     System.out.println(e.getKey() + "::::" + e.getValue());  
}        
//sorted by key,利用treeMap的特性
TreeMap treeMap= new TreeMap(hashMap);
for(Iterator iterator=treeMap.keySet().iterator();iterator.hasNext();) {
     String temp=iterator.next();
     System.out.println(temp + "::::" + treeMap.get(temp));  
     }