--- title: 'Reconstruct intermediate sequences' author: "Kenneth B. Hoehn" date: '`r Sys.Date()`' output: pdf_document: dev: pdf fig_height: 4 fig_width: 7.5 highlight: pygments toc: yes html_document: fig_height: 4 fig_width: 7.5 highlight: pygments theme: readable toc: yes md_document: fig_height: 4 fig_width: 7.5 preserve_yaml: no toc: yes geometry: margin=1in fontsize: 11pt vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Reconstruct Intermediate Sequences} %\VignetteEncoding{UTF-8} %\usepackage[utf8]{inputenc} --- Dowser automatically reconstructs intermediate sequences as part of the `getTrees` function. These are stored in the `nodes` list contained in each `phylo` object. First, collapse internal nodes with identical sequences using the `collapesNodes`. This will significantly clean up the visualization. You could alternatively run `getTrees` with `collapse=TRUE`. Then, visualize the trees using `plotTrees` but with the `node_nums` parameter set. This will display the ID number of each internal node. To obtain the IMGT-gapped sequence for each reconstructed node, specify the clone ID and node number in the `getNodeSeq` function. To obtain all observed and reconstructed sequences for all clones, use the `getAllSeqs` function. You can save the output of `getAllSeqs` as a fasta file using the `dfToFasta` function. ```{r, eval=TRUE, warning=FALSE, message=FALSE} library(dowser) data(ExampleClones) # Collapse nodes with identical sequences. This will trees = collapseNodes(ExampleClones[1:2,]) # Plot trees with node ID numbers plots = plotTrees(trees, tips="c_call", tipsize=2, node_nums=TRUE, labelsize=7) plots[[1]] sequence = getNodeSeq(trees, node=50, clone=3128) print(sequence) # Get all sequences as a data frame all_sequences = getAllSeqs(trees) head(all_sequences) ``` ## Saving sequences to a file The `dfToFasta` function can be used to save a dataframe of sequences as a fasta file: ```{r, eval=FALSE, warning=FALSE, message=FALSE} # Save all sequences as a fasta file dfToFasta(all_sequences, file="all_sequences.fasta", id="node_id", columns=c("clone_id","locus")) ```