General:
readFastqDb to support both GenomicAlignments
(< 1.45.5) and the new cigarillo package (>= 1.45.5), which replaced
explodeCigarOps()/explodeCigarOpLengths() with equivalent functions.
cigarillo is now listed as a suggested dependency.Gene:
countGenes where including "locus" in the groups argument
caused a "duplicated locus" error, as "locus" is added internally by default.General:
collapseDuplicates that triggered an error when all
sequences in a group were classified as ambiguous, resulting in a
dimension-dropping issue during data.frame subsetting.Gene:
groupGenes, the logic for detecting single cell data has been updated.
Instead of throwing an error when an empty cell_id column is found,
a warning will now be issued.Gene and Diversity:
countClones and countGenes where parentheses were
misplaced, causing comparisons to be evaluated incorrectly.General:
alakazam has moved to GitHub: https://github.com/immcantation/alakazam.Gene Usage:
groupGenes has deprecated the only_heavy and split_light
arguments and now exclusively clusters sequences based on heavy chains. For users
who need to split clones further by light chain information, use the
dowser::resolveLightChains function.countGenes to count sequences by locus for bulk data when both copy=NULL
and clone=NULL. The first and collapse arguments (utilized by getGene,
getAllele, and getFamily) are now exposed to provide better control over how
sequences are counted when multiple gene calls are present.Documentation:
Backwards Incompatible Changes:
locus: makeChangeoClone. In
groupGenes, locus was previously required only for single cell data, now
it is also required for bulk data.General:
ExampleTrees to use the igraph 1.5.0 format. See
https://r.igraph.org/news/index.html#igraph-150 for details.collapseDuplicates.Diversity:
plotDiversityCurve and plotAbundanceCurve where limits were
not being applied correctly to zoom in the plots.Gene:
groupGenes where TCR chains where not being considered when
detecting heavy chain sequences prior to subsetting.General:
ape::read.fastq.General:
junctionAlignment, which counts the number of nucleotides in the
reference germline not present in the alignment, and the number of V and J
nucleotides in the CDR3.Gene Usage:
getFamily where temporary designation gene names were not
being correctly subset to the cluster (family) level.Lineage:
runPhylip which was causing buildPhylipLineage to fail
when run on Windows.General:
readFastqDb, which reads a repertoire's .fastq file and imports the
sequencing quality scores for sequence_alignment. Added maskPositionsByQuality
masks positions that have a sequencing quality score lower than the specified
threshold. The convenience function getPositionQuality will create a
data.frame with quality scores per position.dplyr dependency to v1.0.padSeqEnds, the argument mod3=TRUE has been added so that sequences are
padded to a length that is a multiple of 3.translateDNA where NA values weren't being translated properly.Amino Acid Analysis:
aminoAcidProperties,
which will now default to nt=TRUE.Diversity:
countClones (remove_na) that will remove all rows with NA
values in the clone column if TRUE (default) and issue a warning with how many were removed.
If FALSE, those rows will be kept instead.Gene Usage:
getLocus to extract the locus information from the
segment call.getChain to define the chain from the segment or
locus call.countGenes to give a warning instead of
an error so as not to disrupt running workflows.getSegment where filtering of non-localized genes was not
being applied when called from getFamily, because the "NL" part of the name
was removed before the filtering step.getAllele, getGene, getFamily and
getLocus, to parse constant region gene names correctly.getSegment to be able to parse
constant region gene names correctly and not remove the "D" from
"IGHD" when strip_d=TRUE.Lineage:
branch_length argument to buildPhylipLineage, and augmented
graphToPhylo and phyloToGraph to track intermediate sequence in nodes
for phylo object.countGenes (remove_na) that will remove all rows with NA
values in the gene column if TRUE (default) and issue a warning with how many were removed.
If FALSE, those rows will be kept instead.Diversity:
plotDiversityTest that caused all values of q to appear on
the plot rather than just the specified one.Gene Usage:
groupGenes where the v_callj_call column for J gene grouping.groupGenes.only_igh argument of groupGenes to only_heavy.Backwards Incompatible Changes:
V_CALL (Change-O) as the default to identify the field that stored
the V gene calls, they now use v_call (AIRR). That means, scripts that
relied on default values (previously, v_call="V_CALL"), will now fail if
calls to the functions are not updated to reflect the correct value for the
data. If data are in the Change-O format, the current default value
v_call="v_call" will fail to identify the column with the V gene calls
as the column v_call doesn't exist. In this case, v_call="V_CALL" needs
to be specified in the function call.ExampleDb converted to the AIRR Rearrangement standard and examples updated
accordingly. The legacy Change-O version is available as ExampleDbChangeo.GRAVY to gravy);countGenes, countClones (e.g., SEQ_COUNT to seq_count)estimateAbundance (e.g., RANK to rank)groupGenes (e.g., VJ_GROUP to vj_group)collapseDuplicates and makeChangeoClone (e.g., SEQUENCE_ID to
sequence_id, COLLAPSE_COUNT to collapse_count)summarizeTrees, getPathLengths, getMRCA,
tableEdges, testEdges) also return columns in lower case (e.g.,
parent, child, outdegree, steps, annotation, pvalue)IG_COLOR names converted to official C region identifiers
(IGHA, IGHD, IGHE, IGHG, IGHM, IGHK, IGHL).General:
baseTheme looks is now consistent across sizing options.cpuCount will now return 1 if the core count cannot be determined.padSeqEnds wherein the pad_char argument was being
ignored.Diversity:
estimateAbundance slot clone_by now contains the name of the column
with the clonal group identifier, as specified in the function call. For
example, if the function was called with clone="clone_id",
then the clone_by slot will be clone_id.Lineage:
buildPhylipLineage arguments vcall, jcall and
dnapars_exec to v_call, j_call and phylip_exec, respectively.Deprecated:
rarefyDiversity is deprecated in favor of alphaDiversity, which includes
the same functionality.testDiversity is deprecated. The test calculation have been added to the
normal output of alphaDiversity.General:
ape and tibble dependencies.Lineage:
readIgphyml to read in IgPhyML output and combineIgphyml to
combine parameter estimates across samples.graphToPhylo and phyloToGraph to allow conversion between
graph and phylo formats.Diversity:
estimateAbundance where setting the clone column to a
non-default value produced an error.estimateAbundance through the min_n,
max_n, and uniform arguments.estimateAbundance. alphaDiversity will call estimateAbundance for
bootstrapping if not provided an existing AbundanceCurve object.DiversityCurve and AbundanceCurve objects to accommodate
the new diversity methods.Gene Usage:
groupGenes now supports grouping by V gene, J gene, and junction length
(junc_len) as well, in addition to grouping by V gene and J gene without
junction length. Also added support for single-cell input data with the addition
of new arguments cell_id, locus, and only_igh.General:
nonsquareDist function to calculate the non-square distance matrix of
sequences.progressBar, baseTheme, checkColumns and cpuCount.Diversity:
estimateAbundance, and plotAbundanceCurve, will now allow group=NULL
to be specified to performance abundance calculations on ungrouped data.Gene Usage:
fill argument to countGenes. When set TRUE this adds zeroes
to the group pairs that do not exist in the data.groupGenes to group sequences sharing same V and J gene.Topology Analysis:
indirect=TRUE.makeChangeoClone will now issue an error and terminate, instead of
continuing with a warning, when all sequences are not the same length.General:
IPUAC_AA wherein X was not properly matching against Q.getAAMatrix to treat * (stop codon) as a mismatch.General:
readChangeoDb.padSeqEnds function which pads sequences with Ns to make
then equal in length.collapseDuplicates.Diversity:
uniform argument to rarefyDiversity allowing users to toggle
uniform vs non-uniform sampling.plotAbundance to plotAbundanceCurve.estimateAbundance return object from a data.frame to a new
AbundanceCurve custom class.plot call for AbundanceCurve to plotAbundanceCurve.annotate argument from plotDiversityCurve to
plotAbundanceCurve.score argument to plotDiversityCurve to toggle between
plotting diversity or evenness.plotDiversityTest to generate a simple plot of
DiversityTest object summaries.Gene Usage:
omit_nl argument to getAllele, getGene and getFamily to
allow optional filtering of non-localized (NL) genes.Lineage:
makeChangeoClone preventing it from interpreting the id
argument correctly.pad_end argument to makeChangeoClone to allow automatic
padding of ends to make sequences the same length.General:
dry argument to collapseDuplicates which will annotate duplicate
sequences but not remove them when set to TRUE.collapseDuplicates was returning one sequence if all
sequences were considered ambiguous.Lineage:
makeChangeoClone and buildPhylipLineage for purposes of (optionally)
treating indels as mismatches.buildPhylipLineage when PHYLIP doesn't generate inferred
sequences and has only one block.General:
readChangeoDb causing the select argument to do nothing.Gene Usage:
countGenes when the clone argument
is specified to CLONE_COUNT/CLONE_FREQ.General:
readChangeoDb and writeChangeoDb.General:
seqDist() wherein distance was not properly calculated in
some sequences containing gap characters.getAAMatrix() return matrix.General:
readChangeoDb() to wrap data.table::fread() instead of
utils::read.table() if the input file is not compressed.testSeqEqual(), getSeqDistance() and getSeqMatrix() to C++ to
improve performance of collapseDuplicates() and other dependent functions.testSeqEqual(), getSeqDistance() and getSeqMatrix() to
seqEqual(), seqDist() and pairwiseDist(), respectively.pairwiseEqual() which creates a logical sequence distance matrix;
TRUE if sequences are identical, FALSE if not, excluding Ns and gaps.X in
translateDNA().collapseDuplicates() wherein the input data type sanity check
would cause the vignette to fail to build under R 3.3.ExampleDb.gz file with a larger, more clonal, ExampleDb
data object.ExampleTrees with a larger set of trees.multiggplot() to gridPlot().Amino Acid Analysis:
normalize=FALSE for charge calculations to be more consistent
with previously published repertoire sequencing results.Diversity Analysis:
progress argument to rarefyDiversity() and testDiversity() to
enable the (previously default) progress bar.estimateAbundance() were the function would fail if there
was only a single input sequence per group.data and summary slots of DiversityTest to
uppercase for consistency with other tools.plot to plotDiversityCurve for DiversityCurve
objects.Gene Usage:
sortGenes() function to sort V(D)J genes by name or locus position.clone argument to countGenes() to allow restriction of gene
abundance to one gene per clone.Topology Analysis:
General:
base::nchar().General:
Amino Acid Analysis:
aliphatic() function were not being
passed through the ellipsis argument of aminoAcidProperties().aminoAcidProperties().AA_TRANS to ABBREV_AA.Diversity:
rarefyDiversity()
output.Lineage:
ExampleTrees data with example output from buildPhylipLineage().General:
getDNADistMatrix() and getAADistMatrix() to getDNAMatrix and
getAAMatrix(), respectively.getSeqMatrix() which calculates a pairwise distance matrix for a set
of sequences.multiggplot() function for performing multiple panel plots.Amino Acid Analysis:
gravy(), bulk(), aliphatic(), polar(),
charge(), countPatterns() and aminoAcidProperties().Annotation:
getSegment(), getAllele(), getGene() and getFamily(). May be
disabled by providing the argument strip_d=FALSE.countGenes() to tabulate V(D)J allele, gene and family usage.Diversity:
countClones(), estimateAbundance() and plotAbundance().resampleDiversity() to rarefyDiversity() and changed many of
the internals. Bootstrapping is now performed on an inferred complete
relative abundance distribution.rarefyDiversity() and testDiversity().rarefyDiversity()
and testDiversity() are now calculated using the mean and standard
deviation of the bootstrap realizations, rather than the median and
upper/lower quantiles.plotDiversityCurve().Initial public release.
General:
citation("alakazam") command.Lineage:
buildPhylipLineage().Lineage:
buildPhylipLineage() would hang on R 3.2 due to R change
request PR#15508.Prerelease for review.