Distance Profiling:
findThreshold() where the max_itr limit introduced in
v1.3.0 caused rocSpace() to stop() when a single outer fitting iteration
exhausted 1000 attempts, aborting analyses that would previously succeed. The
inner loop now emits a warning() and continues to the next outer iteration
instead of stopping. If all 15 outer iterations fail to produce a valid fit,
an error message is returned.Mutation Profiling:
collapseClones() where column names were set as attributes
when processing single clones, causing calcBaseline() to fail due to
attribute mismatches in subsequent observedMutations() checks. Now using
unname() to remove name attributes.General:
shazam has moved to GitHub: https://github.com/immcantation/shazamMutation simulation:
shmulateSeq now samples mutations from a binomial distribution with provided
probability instead of taking the floor from the number of sequences.Distance Profiling:
distToNearest code, documentation and tests to match changes introduced
in alakazam::groupGenes in alakazam 1.4.1 to handle mixed data (single-cell
and non single-cell, with heavy and/or light chain sequences).model="mk_rs5nf" was specified in distToNearest,
the function would incorrectly use character validation from the non-existing MK_RS1NF@targeting targeting model instead of the correct MK_RS5NF@targeting.Documentation:
Documentation:
General:
alakazam 1.3.0, alakazam::makeChangeoClone requires the parameter
locus with default value locus. This function is used in some examples and
tests in shazam. We added a locus column to the package's example data.Distance Profiling:
distToNearest the parameter locusValues=c("IGH") to specify loci
values to focus the analysis on.distToNearest where grouping by fields was applied
after grouping by genes, therefore not treating independently the different
subsets of data to identify groups of genes. In practice, this means that
if fields was set to treat samples independently (fields='sample_id'),
single linkage was applied to all data, and two genes could be placed in the
same group of genes if they where connected by an ambiguous gene call in any
of the samples. Now, data is separated by fields(sample_id in this example)
before creating the groups of genes, and ambiguities in other samples are not
considered.Mutation simulation:
Mutation Profiling:
Bug fix in parallelization set up for functions slideWindowTune
and slideWindowDb.
plotSlideWindowTune (slideWindowTunePlot). Updated the possible
values of the parameter plotFiltered, for easier usage. The new values
(and their equivalent values in slideWindowTunePlot) are filtered (TRUE),
remaining (FALSE), and per_mutation (NULL).
Deprecated:
slideWindowTunePlot in favor of plotSlideWindowTune, for naming
consistency.General:
New feature:
convertNumbering to convert between numbering systems (IMGT, Kabat).Mutation Profiling:
shmulateTree has new argument nproc to specify the number of cores. Default
values mutThresh and windowSize have been set to mutThresh=6 and
windowSize=10.
Added the option plotFiltered=NULL to slideWindowTunePlot.
Fixed a bug in listObservedMutations not returning a list when db had
one sequence with one mutation.
Fixed bars shifted in plotMutability.
General:
Selection Analysis:
observedMutations, expectedMutations, and calcBaseline can analyze
mutations in all regions (CDR1, CDR2, CDR3, FWR1, FWR2, FWR3 and FWR4) by
specifying regionDefinition=IMGT_VDJ or
regionDefinition=IMGT_VDJ_BY_REGIONS.setRegionBoundaries to build sequence-specific
RegionDefinition objects extending to CDR3 and FWR4.makeGraphDf to facilitate mutational analysis on
lineage trees.Distance Profiling:
distToNearest where TRB and TRD sequences where ignored in
distance calculation.distToNearest causing a fatal error when cross was set.nearestDist causing a fatal error when using model="aa"
and crossGroups.Targeting Models:
plotMutability.Mutation Profiling:
observedMutations and calcObservedMutations causing
mutation counting to fail when there are gap (-) characters in the
germline sequence.Targeting Models:
createTargetingModel causing empty counts in the
numMutS and numMutR slots.Distance Profiling:
distToNearest.groupUsingOnlyIGH argument of distToNearest to onlyHeavy.Backwards Incompatible Changes:
V_CALL (Change-O) as the default to identify the field that stored
the V gene calls, they now use v_call (AIRR). That means, scripts that
relied on default values (previously, v_call="V_CALL"), will now fail if
calls to the functions are not updated to reflect the correct value for the
data. If data are in the Change-O format, the current default value
v_call="v_call" will fail to identify the column with the V gene calls
as the column v_call doesn't exist. In this case, v_call="V_CALL" needs
to be specified in the function call.ExampleDb converted to the AIRR Rearrangement standard and examples
updated accordingly.labels slot of IMGT_V has
changed from CDR_R, CDR_S, FWR_R and FWR_S to cdr_r, cdr_s,
fwr_r and fwr_s, respectively.CODON_TABLE and the different MUTATION_SCHEMES change
from R, S and Stop to r, s and stop, respectively.MU_COUNT_SEQ to mu_count_seq.calcBaseline and related function output columns and S4 object slots.
For example, from PVALUE, REGION and BASELINE_CI_PVALUE to
pvalue, region and baseline_ci_pvalue, respectively.createSubstitutionMatrix, createMutabilityMatrix and
createTargetingModel, changed from model=c("S","RS") to
model=c("s","rs").General:
Targeting Models:
createMutabilityMatrix, extendMutabilityMatrix, createTargetingMatrix,
and createTargetingModel now also returns the numbers of silent and
replacement mutations used for estimating the 5-mer mutabilities. These
numbers are recorded in the numMutS and numMutR slots in the newly
defined MutabilityModel, MutabilityModelWithSource, and TargetingMatrix
classes.Mutation Profiling:
shmulateSeq now also supports specifying the frequency of mutations to be
introduced. (Previously, only the number of mutations was supported.)General:
General:
Distance Calculation:
distToNearest that could potentially cause sequences from
different partitions to be used for distance calculation.General:
Distance Calculation:
plotDensityThreshold for negative densities.distToNearest for performing subsampling while calculating
cross-group nearest neighbor distances.distToNearest now supports, via a new argument
VJthenLen, either a 2-stage partitioning (first by V gene and J gene, then
by junction length), or a 1-stage partitioning (simultaneously by V gene, J
gene, and junction length). For 1-stage partitioning, distToNearest supports
export of the partitioning information as a new column via keepVJLgroup.distToNearest now supports single-cell input data with the addition of new
arguments cellIdColumn, locusColumn, and groupUsingOnlyIGH.Mutation Profiling:
shmulateTree has new arguments, start and end, to specify the region
in the sequence where mutations can be introduced.Selection Analysis:
consensusSequence which can be used to build a
consensus sequence using a variety of methods.General:
TargetingModel and
RegionDefinition S4 classes.General:
subsample argument to distToNearest function.alakazam. Specifically, progressBar, getBaseTheme and checkColumns.clearConsole, getnproc, and getPlatform functions.Distance Calculation:
findThreshold method to density.density method by returning the
bandwidth detection process. The density method should now also yield more
consistent thresholds, on average.subsample argument to findThreshold now applies to both the
density and gmm methods. Subsampling of distance is not performed by
default.plotDensityThreshold and plotGmmThreshold wherein the
breaks argument was ignored when specifying xmax and/or xmin.Selection Analysis:
plotBaselineDensity arising when the groupColumn
and idColumn arguments were set to the same column.sizeElement argument to plotBaselineDensity to control
line sizefield_name argument to field in editBaseline.Selection Analysis:
plotBaselineDensity which caused an empty plot to be
generated if there was only a single value in the idColumn.calcBaseline which caused a crash in summarizeBaseline
and groupBaseline when input baseline is based on only 1 sequence
(i.e. when nrow(baseline@db) is 1).plot call on a Baseline object to plotBaselineDensity.getBaselineStats function.summary method for Baseline objects that calls
summarizeBaseline and returns a data.frame.Mutation Profiling:
shmulateSeq which caused a crash when the input
sequence contains gaps (.).mutations in shmulateSeq to numMutations.shmulateSeq and shmulateTree.calcExpectedMutations will now treat non-ACTG characters as Ns rather
than produce an error.RegionDefinition objects for the full V segment as
single region (IMGT_V_BY_SEGMENTS) and the V segment with each
codon as a separate region (IMGT_V_BY_CODONS).Targeting Models:
calculateMutability function which computes the aggregate
mutability for sequences.createSubstitutionMatrix to fail for data
containing only a single V family.model="S") in
createSubstitutionMatrix, createSubstitutionMatrix and
createTargetingModelplot call on a TargetingModel object to plotMutability.General:
Distance Calculation:
"gmm" method of findThreshold()
that allows users to choose a mixture of two univariate density distribution
functions among four available combinations: "norm-norm", "norm-gamma","gamma-norm", or "gamma-gamma"."gmm"
method of findThreshold() from the best average sensitivity and specificity,
the curve intersection or user defined sensitivity or specificity.cutEdge argument of findThreshold() to edge.Mutation Profiling:
collapseClones(), adding various deterministic and stochastic
methods to obtain effective clonal sequences, support for including ambiguous
IUPAC characters in output, as well as extensive documentation. Removed
calcClonalConsensus() from exported functions.observedMutations() and calcObservedMutations().calcObservedMutations() for sequences with non-triplet overhang at the tail.OBSERVED) and
expected mutations (previously EXPECTED) returned by observedMutations()
and expectedMutations() to MU_COUNT and MU_EXPECTED respectively.Selection Analysis:
calcBaseline() no longer calls collapseClones() automatically if a CLONE
column is present. As indicated by the documentation for calcBaseline()
users are advised to obtain effective clonal sequences (for example, calling
collapseClones()) before running calcBaseline().calcBaseline().Mutation Profiling:
collapseClones() that prevented it from running when nproc
is greater than 1.General:
Mutation Profiling:
collapseClones() that resulted in erroneous CLONAL_SEQUENCE
and CLONAL_GERMLINE being returned.observedMutations was running.General:
Selection Analysis:
summarizeBaseline(). The returned
p-value can now be either positive or negative. Its magnitude (without the
sign) should be interpreted as per normal. Its sign indicates the direction
of the selection detected. A positive p-value indicates positive selection,
whereas a negative p-value indicates negative selection.editBaseline() to exported functions, and a corresponding section
in the vignette.calcBaseline().Targeting Models:
numMutationsOnly argument to createSubstitutionMatrix(), enabling
parameter tuning for minNumMutations.minNumMutationsTune() and minNumSeqMutationsTune() to
tune for parameters minNumMutations and minNumSeqMutations in functions
createSubstitutionMatrix() and createMutabilityMatrix() respectively.
Also added function plotTune() which helps visualize parameter tuning using
the above mentioned two new functions.HKL_S5F).HS5FModel as HH_S5F, MRS5NFModel as MK_RS5NF, and U5NModel
as U5N.HH_S1F),
human kappa and lambda light chain, silent, 1-mer, functional substitution model
(HKL_S1F), and mouse kappa light chain, replacement and silent, 1-mer,
non-functional substitution model (MK_RS1NF).makeDegenerate5merSub and makeDegenerate5merMut which make degenerate
5-mer substitution and mutability models respectively based on the 1-mer models.
Also added makeAverage1merSub and makeAverage1merMut which make 1-mer
substitution and mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw argument to calcObservedMutations(), which if true returns
the positions of point mutations and their corresponding mutation types, as
opposed to counts of mutations (hence "raw").slideWindowSeq() and slideWindowDb() which implement
a sliding window approach towards filtering a single sequence or sequences in
a data.frame which contain(s) equal to or more than a given number of mutations
in a given number of consecutive nucleotides.slideWindowTune() which allows for parameter tuning for
using slideWindowSeq() and slideWindowDb().slideWindowTunePlot() which visualizes parameter tuning
by slideWindowTune().Distance Calculation:
distToNearest wherein normalize="length" for 5-mer models
was resulting in distances normalized by junction length squared instead of
raw junction length.distToNearest wherein symmetry="min" was calculating the
minimum of the total distance between two sequences instead of the minimum
distance at each mutated position.findThreshold function to infer clonal distance threshold from
nearest neighbor distances returned by distToNearest.length option for the normalize argument of distToNearest
to len so it matches Change-O.HS1FDistance and M1NDistance distance models, which have
been renamed to hs1f_compat and m1n_compat in the model argument of
distToNearest. These deprecated models should be used for compatibility
with DefineClones in Change-O v0.3.3. These models have been replaced by
replaced by hh_s1f and mk_rs1nf, which are supported by Change-O v0.3.4.hs5f model in distToNearest to hh_s5f.MK_RS5NF models to distToNearest.calcTargetingDistance() to enable calculation of a symmetric distance
matrix given a 1-mer substitution matrix normalized by row, such as HH_S1F.findThreshold. The previous smoothed density method is available via the
method="density" argument and the new GMM method is available via
method="gmm".plotGmmThreshold and plotDensityThreshold to plot
the threshold detection results from findThreshold for the "gmm" and
"density" methods, respectively.Region Definition:
IMGT_V_NO_CDR3 and IMGT_V_BY_REGIONS_NO_CDR3. Updated IMGT_V
and IMGT_V_BY_REGIONS so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly argument to createMutabilityMatrix(), enabling
parameter tuning for minNumSeqMutations.General:
InfluenzaDb data object, in favor of the updated ExampleDb
provided in alakazam 0.2.4.Distance Calculation:
cross argument to distToNearest() which allows restriction of
distances to only distances across samples (i.e., excludes within-sample
distances).mst flag to distToNearest(), which will return all distances to
neighboring nodes in a minimum spanning tree.aa model
of distToNearest().aa model of distToNearest().Mutation Profiling:
MutationDefinition VOLUME_MUTATIONS.shmulateSeq() and shmulateTree() to simulate
mutations on sequences and lineage trees, respectively, using a 5-mer
targeting model.collapseByClone, calcDbExpectedMutations and
calcDbObservedMutations to collapseClones, expectedMutations,
and observedMutations, respectively.Selection Analysis:
Baseline object through groupBaseline()
multiple times resulted in incorrect normalization.title options to plotBaselineSummary() and plotBaselineDensity().plotBaselineSummary()
and plotBaselineDensity().testBaseline() function to test the significance of
differences between two selection distributions.General:
InfluenzaDb.dplyr::tbl_df object instead of a data.frame.Distance Calculation:
distToNearest() did not return the nearest neighbor
with a non-zero distance.Targeting Models:
createSubstitutionMatrix(),createMutabilityMatrix(), and plotMutability().plotMutability().Mutation Profiling:
MutationDefinition objects MUTATIONS_CHARGE,
MUTATIONS_HYDROPATHY, MUTATIONS_POLARITY providing alternate approaches
to defining replacement and silent annotations to mutations when calling
calcDBObservedMutations() and calcDBExpectedMutations().regionDefinition=NULL consistent for all mutation
profiling functions. Now the entire sequence is used as the region and
calculations are made accordingly.calcDBObservedMutations() returns R and S mutations also
when regionDefinition=NULL. Older versions reported the sum of R and S
mutations. The function will add the columns OBSERVED_SEQ_R and
OBSERVED_SEQ_S when frequency=FALSE, and MU_FREQ_SEQ_R and
MU_FREQ_SEQ_R when frequency=TRUE.General:
Distance Calculation:
symmetry parameter to distToNearest to change behavior of how
asymmetric distances (A->B != B->A) are combined to get distance
between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations parameter to createSubstitutionMatrix. This is the
minimum number of observed 5-mers required for the substitution model.
The substitution rate of 5-mers with fewer number of observed mutations
will be inferred from other 5-mers.minNumSeqMutations parameter to createMutabilityMatrix. This is the
minimum number of mutations required in sequences containing the 5-mers of
interest. The mutability of 5-mers with fewer number of observed mutations
in the sequences will be inferred.returnModel parameter to createSubstitutionMatrix. This gives user
the option to return 1-mer or 5-mer model.returnSource parameter to createMutabilityMatrix. If TRUE, the
code will return a data frame indicating whether each 5-mer mutability is
observed or inferred.Initial public release.
General:
Influenza.tab file did not load on Mac OS X.citation("shazam") command.Distance Calculation:
HS1FDistance, based on the
Yaari et al, 2013 data.hs1f as the default distance model for distToNearest().distToNearest().Mutation Profiling:
calcDBClonalConsensus() so that the function now works
correctly when called with the argument collapseByClone=FALSE.frequency argument to calcObservedMutations() and
calcDBObservedMutations(), which enables return of mutation frequencies
rather the default of mutation counts.Targeting Models:
M3NModel and all options for using said model.createSubstitutionMatrix() and createMutabilityMatrix()
where IMGT gaps were not being handled.General:
Targeting Models:
Targeting Models:
U5NModel, which is a uniform 5-mer model.plotMutability() output.Prerelease for review.