Basic usage (quick start)
Make a basic RNA transcript structure plot
[1]:
import RNApysoforms as RNApy
[2]:
## Path to your ENSEMBL GTF file
ensembl_gtf_path = "../../tests/test_data/Homo_sapiens_chr21_and_Y.GRCh38.110.gtf"
## Read ENSEMBL gtf
annotation = RNApy.read_ensembl_gtf(ensembl_gtf_path)
## Filter gene name in annotation and counts matrix
sod1_annotation = RNApy.gene_filtering(annotation=annotation, target_gene="SOD1")
sod1_annotation.head()
[2]:
shape: (5, 11)
| gene_id | gene_name | transcript_id | transcript_name | transcript_biotype | seqnames | strand | type | start | end | exon_number |
|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | str | str | str | str | str | i64 | i64 | i64 |
| "ENSG00000142168" | "SOD1" | "ENST00000389995" | "SOD1-202" | "protein_coding" | "21" | "+" | "exon" | 31659666 | 31659784 | 1 |
| "ENSG00000142168" | "SOD1" | "ENST00000389995" | "SOD1-202" | "protein_coding" | "21" | "+" | "CDS" | 31659770 | 31659784 | 1 |
| "ENSG00000142168" | "SOD1" | "ENST00000389995" | "SOD1-202" | "protein_coding" | "21" | "+" | "exon" | 31663790 | 31663886 | 2 |
| "ENSG00000142168" | "SOD1" | "ENST00000389995" | "SOD1-202" | "protein_coding" | "21" | "+" | "CDS" | 31663790 | 31663886 | 2 |
| "ENSG00000142168" | "SOD1" | "ENST00000389995" | "SOD1-202" | "protein_coding" | "21" | "+" | "exon" | 31666449 | 31666518 | 3 |
[3]:
## Make introns
sod1_annotation = RNApy.to_intron(sod1_annotation)
sod1_annotation.head()
[3]:
shape: (5, 11)
| gene_id | gene_name | transcript_id | transcript_name | transcript_biotype | seqnames | strand | type | start | end | exon_number |
|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | str | str | str | str | str | i64 | i64 | i64 |
| "ENSG00000142168" | "SOD1" | "ENST00000270142" | "SOD1-201" | "protein_coding" | "21" | "+" | "exon" | 31659693 | 31659841 | 1 |
| "ENSG00000142168" | "SOD1" | "ENST00000270142" | "SOD1-201" | "protein_coding" | "21" | "+" | "CDS" | 31659770 | 31659841 | 1 |
| "ENSG00000142168" | "SOD1" | "ENST00000270142" | "SOD1-201" | "protein_coding" | "21" | "+" | "intron" | 31659842 | 31663789 | 1 |
| "ENSG00000142168" | "SOD1" | "ENST00000270142" | "SOD1-201" | "protein_coding" | "21" | "+" | "CDS" | 31663790 | 31663886 | 2 |
| "ENSG00000142168" | "SOD1" | "ENST00000270142" | "SOD1-201" | "protein_coding" | "21" | "+" | "exon" | 31663790 | 31663886 | 2 |
[4]:
## Create traces for plotting
traces = RNApy.make_traces(annotation=sod1_annotation, y='transcript_id', annotation_hue="transcript_biotype")
## Put traces into figure
fig = RNApy.make_plot(traces = traces, subplot_titles = ["Transcript Structure"], width=1200, height=500)
## Show figure
fig.show()
Notes:
You can click on the legend items to make figure elements appear and disappear.
The legend title will get grayed out when clicking on the first legend item. I could not find a workaround for that with the current plotly release (version 5).
The hovering for exons and CDS works best if you hover your mouse over the corners of the CDS/exon boxes.