A local transcriptome assembler for SNPs, indels and AS events

Brief introduction to KisSplice

Introductory slides pdf

The data

The fastq files can be found here
Those correspond to long polyA+ RNAs from SKNSH cell lines. This dataset was generated in the context of the ENCODE project (SRA: SRR315315,SRR315316,SRR534309,SRR534310). There are two experimental conditions (treated with retinoic acid, untreated), and two replicates per condition. Only 10M reads per replicates are analysed here. For an analysis of the full dataset, see here

Running KisSplice

Here is the command in order to run KisSplice on this dataset :

kissplice -r SknshRACellRep1_10M.fastq -r SknshRACellRep2_10M.fastq
-r SknshCellRep3_10M.fastq -r SknshCellRep4_10M.fastq

The job should take approximately 20mn. The file containing the bubbles corresponding to the alternative splicing can be found here : results_k41_coherents_type_1.fa
All output files of kissplice are here

Aligning to the reference genome

STAR can be used to align the alternative splicing events found by kissplice back to the reference genome. Here is the reference genome in fasta format : Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa
And the associated gtf : Homo_sapiens.GRCh38.84.gtf
Building STAR index of the genome :

STAR --runMode genomeGenerate --genomeDir STAR_index --genomeFastaFiles Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.84.gtf

The build index can be found here

Aligning to the reference genome :

STAR --genomeDir STAR_index --readFilesIn results_k41_coherents_type_1.fa

The alignment (in sam format) can be found here

Running KisSplice2RefGenome

kissplice2refgenome -a Homo_sapiens.GRCh38.84.gtf Aligned.out.sam

The ouput file can be found here : k2rg-v1.0.1_sknsh.txt

Running kissDE

The last version of kissDE can be downloaded here.

countsData<-kissplice2counts("k2rg-v1.0.1_sknsh.txt", k2rg=TRUE)
conditions<-c("SknshRA","SknshRA","Sknsh", "Sknsh")
results<-diffExpressedVariants(countsData, conditions)
writeOutputKissDE(results, adjPvalMax = 0.05, dPSImin = 0.1, output = "")

The output file can be found here:

Visualising AS events of interest (optional)

Reads mapped to the annotated reference genome are available here. You can download them on your machine using wget.


Then launch IGV, select hg38 genome, import these tracks, zoom on a gene (for instance MYL6), right-click on the left panel of one of the tracks, select sashimi plot option. You should visualise something like this (you can filter out exon junctions supported by fewer than 5 reads by right-clicking and selecting "Set junction Coverage Min"):