RNA_pysoforms.make_traces

RNApysoforms.make_traces(annotation: DataFrame | None = None, expression_matrix: DataFrame | None = None, order_transcripts_by_expression_matrix: bool = True, y: str = 'transcript_id', x_start: str = 'start', x_end: str = 'end', annotation_hue: str | None = None, expression_hue: str | None = None, cds: str = 'CDS', exon: str = 'exon', intron: str = 'intron', expression_columns: str | List[str] = ['counts'], sample_id_column: str = 'sample_id', annotation_fill_color: str = 'grey', expression_fill_color: str = 'grey', annotation_color_palette: List[str] = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A', '#19D3F3', '#FF6692', '#B6E880', '#FF97FF', '#FECB52'], expression_color_palette: List[str] = ['#FECB52', '#FF97FF', '#B6E880', '#FF6692', '#19D3F3', '#FFA15A', '#AB63FA', '#00CC96', '#EF553B', '#636EFA'], annotation_color_map: Dict[str, str] | None = None, expression_color_map: Dict[str, str] | None = None, intron_line_width: float = 0.5, exon_line_width: float = 0.25, expression_line_width: float = 0.5, line_color: str = 'black', expression_plot_style: str = 'boxplot', spanmode: str = 'hard', marker_color: str = 'black', marker_opacity: float = 1, marker_size: int = 5, marker_jitter: float = 0.3, expression_plot_opacity: float = 1, transcript_plot_opacity: float = 1, exon_height: float = 0.3, cds_height: float = 0.5, arrow_size: float = 10, hover_start: str = 'start', hover_end: str = 'end', show_box_mean: bool = True, box_points: str | bool = 'all', expression_plot_legend_title: str = '<b><u>Expression Plot Hue<u><b>', transcript_plot_legend_title: str = '<b><u>Transcript Structure Hue<u><b>') List[Box | Violin | dict | Dict[str, int]][source]

Generates Plotly traces for visualizing transcript structures and expression data.

This function processes genomic annotation and expression data to create Plotly traces suitable for plotting transcript structures alongside expression data. It supports customization of plot aesthetics, including colors, line widths, plot styles, and annotations. The function returns a list of traces that can be directly used in Plotly figures, including traces for exons, introns, CDS regions, and expression data.

Required Columns in `annotation` DataFrame: - y (default “transcript_id”): Identifier for each transcript. - x_start (default “start”): Start position of the feature. - x_end (default “end”): End position of the feature. - “strand”: Strand information (“+” or “-”). - “seqnames”: Chromosome or sequence name. - hover_start (default “start”): Start position for hover information. - hover_end (default “end”): End position for hover information. - If annotation_hue is specified, it must also be a column in annotation.

Required Columns in `expression_matrix` DataFrame: - y (default “transcript_id”): Identifier for each transcript. - sample_id_column (default “sample_id”): Identifier for each sample. - expression_columns (default [“counts”]): Column name or list of column names containing expression values

that you want to plot in the order you want to plot them.

  • If expression_hue is specified, it must also be a column in expression_matrix.

Parameters:
  • annotation (pl.DataFrame, optional) – A Polars DataFrame containing genomic annotation data for transcripts. Includes exons, introns, and CDS features. If provided, the function will generate traces for transcript structures.

  • expression_matrix (pl.DataFrame, optional) – A Polars DataFrame containing expression data. If provided, the function will generate traces for expression plots.

  • order_transcripts_by_expression_matrix (bool, optional) – If True, orders transcripts based on their order in the expression matrix. If False, orders by annotation DataFrame. Default is True.

  • y (str, optional) – Column name in both annotation and expression_matrix representing transcript identifiers. Default is “transcript_id”.

  • x_start (str, optional) – Column name in annotation representing the start position of features. Default is “start”.

  • x_end (str, optional) – Column name in annotation representing the end position of features. Default is “end”.

  • annotation_hue (str, optional) – Column name in annotation used to color-code transcript features based on categories. Default is None.

  • expression_hue (str, optional) – Column name in expression_matrix used to color-code expression data based on categories. Default is None.

  • cds (str, optional) – Value in the type column of annotation representing CDS features. Default is “CDS”.

  • exon (str, optional) – Value in the type column of annotation representing exon features. Default is “exon”.

  • intron (str, optional) – Value in the type column of annotation representing intron features. Default is “intron”.

  • expression_columns (Union[str, List[str]], optional) – Column name or list of column names in expression_matrix containing expression values. If a string is provided, it is converted to a list containing that string. Default is [“counts”].

  • sample_id_column (str, optional) – Column name in expression_matrix representing sample identifiers. Default is “sample_id”.

  • annotation_fill_color (str, optional) – Default fill color for transcript features if annotation_hue is not specified. Default is “grey”.

  • expression_fill_color (str, optional) – Default fill color for expression plots if expression_hue is not specified. Default is “grey”.

  • annotation_color_palette (List[str], optional) – List of colors to use for different categories in annotation_hue. Default uses Plotly qualitative palette.

  • expression_color_palette (List[str], optional) – List of colors to use for different categories in expression_hue. Default uses reversed Plotly qualitative palette.

  • annotation_color_map (dict, optional) – Mapping from categories in annotation_hue to colors. If None, colors are assigned from annotation_color_palette.

  • expression_color_map (dict, optional) – Mapping from categories in expression_hue to colors. If None, colors are assigned from expression_color_palette.

  • intron_line_width (float, optional) – Line width for intron traces. Default is 0.5.

  • exon_line_width (float, optional) – Line width for exon traces. Default is 0.25.

  • expression_line_width (float, optional) – Line width for expression plot traces. Default is 0.5.

  • line_color (str, optional) – Color for the lines outlining transcript features. Default is “black”.

  • expression_plot_style (str, optional) – Style of the expression plot. Options are “boxplot” or “violin”. Default is “boxplot”.

  • spanmode (str, optional) – For violin plots, defines how the width of the violin spans the data. Options are “hard” or “soft”. Default is “hard”.

  • marker_color (str, optional) – Color of the markers in expression plots. Default is “black”.

  • marker_opacity (float, optional) – Opacity of the markers in expression plots. Default is 1.

  • marker_size (int, optional) – Size of the markers in expression plots. Default is 5.

  • marker_jitter (float, optional) – Amount of jitter (spread) applied to the markers in expression plots. Default is 0.3.

  • expression_plot_opacity (float, optional) – Opacity of the expression plot traces. Default is 1.

  • transcript_plot_opacity (float, optional) – Opacity of the transcript structure traces. Default is 1.

  • exon_height (float, optional) – Height of exon rectangles in the plot. Default is 0.3.

  • cds_height (float, optional) – Height of CDS rectangles in the plot. Default is 0.5.

  • arrow_size (float, optional) – Size of the arrow markers for introns. Default is 10.

  • hover_start (str, optional) – Column name in annotation for the start position displayed in hover information. Default is “start”.

  • hover_end (str, optional) – Column name in annotation for the end position displayed in hover information. Default is “end”.

  • show_box_mean (bool, optional) – If True, shows the mean in box and violin plots. Default is True.

  • box_points (Union[str, bool], optional) – Controls the display of points in box and violin plots. Options include “all”, “outliers”, “suspectedoutliers”, or False. Default is “all”.

  • expression_plot_legend_title (str, optional) – Title for the legend of the expression plot. Default is “<b><u>Expression Plot Hue<u><b>”.

  • transcript_plot_legend_title (str, optional) – Title for the legend of the transcript structure plot. Default is “<b><u>Transcript Structure Hue<u><b>”.

Returns:

A list containing the generated Plotly traces and a mapping of transcript identifiers to y-axis positions. The list includes: - Transcript feature traces (exons, CDS, introns) as dictionaries. - Expression plot traces as go.Box or go.Violin objects. - The y_dict mapping, which maps transcript identifiers to their corresponding y-axis positions.

Return type:

List[Union[go.Box, go.Violin, dict, Dict[str, int]]]

Raises:
  • ValueError – If neither annotation nor expression_matrix is provided.

  • TypeError – If annotation or expression_matrix are not Polars DataFrames.

  • ValueError – If required columns are missing in annotation or expression_matrix.

  • ValueError – If there are no matching transcripts between annotation and expression_matrix.

  • ValueError – If an invalid expression_plot_style is provided.

Examples

Generate traces for plotting transcript structures and expression data:

>>> import polars as pl
>>> from RNApysoforms.plotting import make_traces
>>> # Prepare annotation DataFrame
>>> # Create sample annotation DataFrame
>>> annotation_df = pl.DataFrame({
>>>    "transcript_id": ["tx1", "tx1", "tx2", "tx2"],
>>>    "start": [100, 200, 150, 250],
>>>    "end": [150, 250, 200, 300],
>>>    "type": ["exon", "CDS", "exon", "CDS"],
>>>    "strand": ["+", "+", "-", "-"],
>>>    "seqnames": ["chr1", "chr1", "chr2", "chr2"]
>>> })
>>> # Create sample expression matrix
>>> expression_df = pl.DataFrame({
>>>    "transcript_id": ["tx1", "tx1", "tx2", "tx2"],
>>>    "sample_id": ["sample1", "sample2", "sample1", "sample2"],
>>>    "counts": [100, 200, 150, 250]
>>> })
>>> # Generate traces
>>> traces = make_traces(annotation=annotation_df, expression_matrix=expression_df)
>>> # Use traces to create a Plotly figure (not shown here)

Notes

  • The function ensures that both annotation and expression_matrix contain common transcripts. It filters out any transcripts not present in both DataFrames.

  • Warnings are issued if transcripts are present in one DataFrame but missing in the other.

  • Traces are generated for exons, CDS, and introns with customizable aesthetics.

  • Expression data can be visualized using box plots or violin plots, with options for coloring by categories.

  • The y_dict mapping is used to align transcripts across different plots by assigning consistent y-axis positions.

  • The function handles strand direction when plotting intron arrows.

  • Custom legends and hover information can be configured via parameters.