Title: | R-Values for Ranking in High-Dimensional Settings |
---|---|
Description: | A collection of functions for computing "r-values" from various kinds of user input such as MCMC output or a list of effect size estimates and associated standard errors. Given a large collection of measurement units, the r-value, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top r-fraction of units. |
Authors: | Nicholas Henderson [cre], Michael Newton [aut] |
Maintainer: | Nicholas Henderson <[email protected]> |
License: | GPL-2 |
Version: | 0.7.1 |
Built: | 2024-11-19 05:10:40 UTC |
Source: | https://github.com/cran/rvalues |
Data set containing number of at-bats and number of hits for Major League baseball players over the 2005 season.
data(batavgs)
data(batavgs)
A data frame with 929 observations on the following 7 variables.
First.Name
factor; player's last name
Last.Name
factor; player's first name
Pitcher
numeric vector; an indicator of whether or not the player is a pitcher
midseasonAB
numeric vector; number of at-bats during the first half of the season
midseasonH
numeric vector; number of hits during the first half of the season
TotalAB
numeric vector; total number of at-bats over the season
TotalH
numeric vector; total number of hits over the season
The 2005 Major League Baseball season was roughly six months starting from the beginning of April and ending at the beginning of October. Data from postseason play is not included. The midseason data were obtained by only considering the first three months of the season.
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoas/1206367815
Brown, L. D. (2008), In-Season prediction of batting averages: a field test of empirical Bayes and Bayes Methodologies, The Annals of Applied Statistics, 2, 1, 113–152.
data(batavgs) head(batavgs)
data(batavgs) head(batavgs)
Effect size estimates and standard errors obtained from gene expression measurements on 7129 genes across 49 samples.
data(bcwest)
data(bcwest)
A data frame with 7129 observations on the following 2 variables.
estimates
a vector of effect size estimates
std.err
standard errors associated with effect size estimates
A description of the original data may be found in West et al. (2001). For each gene, the effect size estimate was computed from the difference in the mean expression levels of the two groups (i.e., mean(bc-positive group) - mean(bc-negative group)).
T. Hothorn, P. Buehlmann, T. Kneib, M. Schmid, and B. Hofner (2013). mboost: Model-Based Boosting, R package version 2.2-3, http://CRAN.R-project.org/package=mboost.
Mike West, Carrie Blanchette, Holly Dressman, Erich Huang, Seiichi Ishida, Rainer Spang, Harry Zuzan, John A. Olson Jr., Jeffrey R. Marks and Joseph R. Nevins (2001), Predicting the clinical status of human breast cancer by using gene expression profiles, Proceedings of the National Academy of Sciences, 98, 11462-11467.
Estimates the expected proportion of misclassified units when using a given r-value threshold. If plot=TRUE, the curve is plotted before the estimated function is returned.
FDRCurve(object, q, threshold = 1, plot = TRUE, xlim, ylim, xlab, ylab, main, ...)
FDRCurve(object, q, threshold = 1, plot = TRUE, xlim, ylim, xlab, ylab, main, ...)
object |
An object of class "rvals" |
q |
A value in between 0 and 1; the desired level of FDR control. |
threshold |
The r-value threshold. |
plot |
logical; if TRUE, the estimated FDR curve is plotted. |
xlim , ylim
|
x and y - axis limits for the plot |
xlab , ylab
|
x and y - axis labels |
main |
the title of the plot |
... |
additional arguments to |
Consider parameters of interest with an effect of size of interest
. That is, a unit is taken to be "null" if
and
taken to be "non-null" if
.
For r-values and a procedure which "rejects" units
satisfying
, the FDR is defined to be
FDRCurve
estimates
for values of
across (0,1) and plots (if
plot=TRUE
)
the resulting curve.
A list with the following two components
fdrcurve |
A function which returns the estimated FDR for each r-value threshold. |
Rval.cut |
The largest r-value cutoff which still gives an estimated FDR less than q. |
Nicholas Henderson and Michael Newton
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) FDRCurve(rvs, q = .1, threshold = .3, cex.main = 1.5)
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) FDRCurve(rvs, q = .1, threshold = .3, cex.main = 1.5)
Gene-set enrichment for genes that have been identified as having an effect on influenza-virus replication.
data(fluEnrich)
data(fluEnrich)
A data frame with 5719 observations on the following 3 variables.
nflugenes
number of genes both annotated to the given GO term and in the collection of flu genes
setsize
number of genes annotated to the given GO term
GO_terms
the GO (gene ontology) term label
These data were produced by associating the 984 genes (the collection of flu genes) identified in the Hao et al. (2013) meta-analysis with gene ontology (GO) gene sets (GO terms). In total, 17959 human genes were annotated to at least one GO term and 16572 GO terms were available, though this data set only contains the 5719 terms which annotated between 10 and 1000 human genes.
Hao, L. Q. He, Z. Wang, M. Craven, M. A. Newton, and P. Ahlquist (2013). Limited agreement of independent RNAi screens for virus-required host genes owes more to false-negatives than false-positive factors. PLoS computational biology, 9, 9, e1003235.
These data contain effect size estimates and standard errors obtained from gene expression measurements on 7680 genes across 8 samples.
data(hiv)
data(hiv)
A data frame with 7680 observations on the following 2 variables.
estimates
a vector of effect size estimates
std.err
standard errors associated with effect size estimates
van't Wout, et. al. (2003), Cellular gene expression upon human immuno-deficiency virus type 1 infection of CD4+-T-Cell lines, Journal ofVirology, 77, 1392–1402.
Computes a grid of points on the interval (0,1). This function is useful for constructing the "alpha-grid" used in various r-value computations.
MakeGrid(nunits, type = "log", ngrid = NULL, lower = 1/nunits, upper = 1 - lower)
MakeGrid(nunits, type = "log", ngrid = NULL, lower = 1/nunits, upper = 1 - lower)
nunits |
The number of units in the data for which r-values are to be calculated. |
type |
The type of grid; type can be set to |
ngrid |
a number specifying the number of grid points |
lower |
the smallest grid point; must be greater than zero |
upper |
the largest grid point; must be less than one |
If nunits
, the default number of grid points is
equal to
nunits
. When nunits
, the default
number of grid points is determined by
.
A vector of grid points in (0,1).
Nicholas Henderson and Michael Newton
alpha.grid <- MakeGrid(1000,type="uniform",ngrid=200) log.grid <- MakeGrid(40,type="log") log.grid hist(log.grid)
alpha.grid <- MakeGrid(1000,type="uniform",ngrid=200) log.grid <- MakeGrid(40,type="log") log.grid hist(log.grid)
a matrix of test MCMC output
data(MCMCtest)
data(MCMCtest)
A 2000 x 400 numeric matrix. Data in the ith row should be thought of as a sample from the posterior for the ith case of interest.
For a given multi-dimensional function with both a vector of lower bounds
and upper bounds, mroot
finds a vector such that each
component of the function is zero.
mroot(f, lower, upper, ..., f.lower = f(lower, ...), f.upper = f(upper, ...), tol = .Machine$double.eps^0.25, maxiter = 5000)
mroot(f, lower, upper, ..., f.lower = f(lower, ...), f.upper = f(upper, ...), tol = .Machine$double.eps^0.25, maxiter = 5000)
f |
the function for which the root is sought |
lower |
a vector of lower end points |
upper |
a vector of upper end points |
... |
additional arguments to be passed to |
f.lower , f.upper
|
the same as |
tol |
the convergence tolerance |
maxiter |
the maximum number of iterations |
The function f
is from to
with
.
A root of
satisfies
for each component
.
lower
and
upper
are both n-dimensional vectors
such that, for each
,
changes sign over the
interval
.
a vector giving the estimated root of the function
Nicholas Henderson
ff <- function(x,a) { ans <- qnorm(x) - a return(ans) } n <- 10000 a <- rnorm(n) low <- rep(0,n) up <- rep(1,n) ## Find the roots of ff, first using mroot and ## then by using uniroot inside a loop. system.time(mr <- mroot(ff, lower = low, upper = up, a = a)) ur <- rep(0,n) system.time({ for(i in 1:n) { ur[i] <- uniroot(ff, lower = 0, upper = 1, a = a[i])$root } })
ff <- function(x,a) { ans <- qnorm(x) - a return(ans) } n <- 10000 a <- rnorm(n) low <- rep(0,n) up <- rep(1,n) ## Find the roots of ff, first using mroot and ## then by using uniroot inside a loop. system.time(mr <- mroot(ff, lower = low, upper = up, a = a)) ur <- rep(0,n) system.time({ for(i in 1:n) { ur[i] <- uniroot(ff, lower = 0, upper = 1, a = a[i])$root } })
Free throw statistics on 482 active players, 2013-2014 season
data(NBA1314)
data(NBA1314)
A data frame with 482 players (rows) variables including.
RK
rank of player by proportion of free throws made
PLAYER
name of player
TEAM
player's team
GP
games played
PPG
points per game
FTM0
FTM/GP
FTA0
FTA/GP
FTA
free throws attempted
FTM
free throws made
FTprop
FTA/FTM
Data obtained from ESPN.
See data analyzed in Henderson and Newton, 2015
data(NBA1314) nba.dat <- cbind(NBA1314$FTM, NBA1314$FTA) rownames(nba.dat) <- NBA1314$PLAYER rvals.nba <- rvalues(nba.dat, family=binomial)
data(NBA1314) nba.dat <- cbind(NBA1314$FTM, NBA1314$FTA) rownames(nba.dat) <- NBA1314$PLAYER rvals.nba <- rvalues(nba.dat, family=binomial)
Using a nonparametric estimate of the mixing distribution, computes a posterior quantity of interest for each unit.
npmixapply(object, FUN, ...)
npmixapply(object, FUN, ...)
object |
an object of class "npmix" |
FUN |
a user provided function |
... |
optional arguments to FUN |
object
is an object of class "npmix" containing
a nonparametric estimate of the mixing distribution
in the following two-level sampling model
~
and
~
for
.
Using npmixapply(object, f)
, then returns the
posterior expectation of :
, for
.
a vector with length equal to
Nicholas Henderson
## Not run: data(hiv) npobj <- npmle(hiv, family = gaussian, maxiter = 4) ### Compute unit-specific posterior means pmean <- npmixapply(npobj, function(x) { x }) ### Compute post. prob that \theta_i < .1 pp <- npmixapply(npobj, function(x) { x < .1}) ## End(Not run)
## Not run: data(hiv) npobj <- npmle(hiv, family = gaussian, maxiter = 4) ### Compute unit-specific posterior means pmean <- npmixapply(npobj, function(x) { x }) ### Compute post. prob that \theta_i < .1 pp <- npmixapply(npobj, function(x) { x < .1}) ## End(Not run)
Estimates the mixture distribution nonparametrically using an EM algorithm. The estimate is discrete with the results being returned as a vector of support points and a vector of associated mixture probabilities. The available choices for the sampling distribution include: Normal, Poisson, Binomial and t-distributions.
npmle(data, family = gaussian, maxiter = 500, tol = 1e-4, smooth = TRUE, bass = 0, nmix = NULL)
npmle(data, family = gaussian, maxiter = 500, tol = 1e-4, smooth = TRUE, bass = 0, nmix = NULL)
data |
A data frame or a matrix with the number of rows equal to the number of sampling units. The first column should contain the main estimates, and the second column should contain the nuisance terms. |
family |
family determining the sampling distribution (see family) |
maxiter |
the maximum number of EM iterations |
tol |
the convergence tolerance |
smooth |
logical; whether or not to smooth the estimated cdf |
bass |
controls the smoothness level; only relevant if |
nmix |
optional; the number of mixture components |
Assuming the following two-level sampling model
~
and
~
for
.
The function
npmle
seeks to find an estimate of the mixing distribution
which maximizes the marginal log-likelihood
The distribution function maximizing is
known to be discrete; and thus, the estimated mixture distribution is
returned as a set of support points and associated mixture
probabilities.
An object of class npmix which is a list containing at least the following components
support |
a vector of estimated support points |
mix.prop |
a vector of estimated mixture proportions |
Fhat |
a function; obtained through interpolation of the estimated discrete cdf |
fhat |
a function; estimate of the mixture density |
loglik |
value of the log-likelihood at each iteration |
convergence |
0 indicates convergence; 1 indicates that convergence was not achieved |
numiter |
the number of EM iterations required |
Nicholas Henderson and Michael Newton
Laird, N.M. (1978), Nonparametric maximum likelihood estimation of a mixing distribution, Journal of the American Statistical Association, 73, 805–811.
Lindsay, B.G. (1983), The geometry of mixture likelihoods: a general theory. The Annals of Statistics, 11, 86–94
## Not run: data(hiv) npobj <- npmle(hiv, family = tdist(df=6), maxiter = 25) ### Generate Binomial data with Beta mixing distribution n <- 3000 theta <- rbeta(n, shape1 = 2, shape2 = 10) ntrials <- rpois(n, lambda = 10) x <- rbinom(n, size = ntrials, prob = theta) ### Estimate mixing distribution dd <- cbind(x,ntrials) npest <- npmle(dd, family = binomial, maxiter = 25) ### compare with true mixture cdf tt <- seq(1e-4,1 - 1e-4, by = .001) plot(npest, lwd = 2) lines(tt, pbeta(tt, shape1 = 2, shape2 = 10), lwd = 2, lty = 2) ## End(Not run)
## Not run: data(hiv) npobj <- npmle(hiv, family = tdist(df=6), maxiter = 25) ### Generate Binomial data with Beta mixing distribution n <- 3000 theta <- rbeta(n, shape1 = 2, shape2 = 10) ntrials <- rpois(n, lambda = 10) x <- rbinom(n, size = ntrials, prob = theta) ### Estimate mixing distribution dd <- cbind(x,ntrials) npest <- npmle(dd, family = binomial, maxiter = 25) ### compare with true mixture cdf tt <- seq(1e-4,1 - 1e-4, by = .001) plot(npest, lwd = 2) lines(tt, pbeta(tt, shape1 = 2, shape2 = 10), lwd = 2, lty = 2) ## End(Not run)
Estimates the expected proportion of units in the top fraction and those deemed to be in the top fraction by the r-value procedure. If plot=TRUE, the curve is plotted before the estimated function is returned.
OverlapCurve(object, plot = TRUE, xlim, ylim, xlab, ylab, main, ...)
OverlapCurve(object, plot = TRUE, xlim, ylim, xlab, ylab, main, ...)
object |
An object of class "rvals" |
plot |
logical. If TRUE, the estimated overlap curve is plotted. |
xlim , ylim
|
x and y - axis limits for the plot |
xlab , ylab
|
x and y - axis labels |
main |
the title of the plot |
... |
additional arguments to |
For parameters of interest and corresponding
r-values
, the overlap at a particular value of
is defined to be
where the threshold is the upper-
th quantile of
the distribution of the
(i.e.,
).
OverlapCurve
estimates this overlap
for values of alpha across (0,1) and plots (if plot=TRUE
)
the resulting curve.
A function returning estimated overlap values.
Nicholas Henderson and Michael Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) OverlapCurve(rvs, cex.main = 1.5)
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) OverlapCurve(rvs, cex.main = 1.5)
Computes posterior expected percentiles for both parametric and nonparametric models.
PostPercentile(object)
PostPercentile(object)
object |
An object of class "rvals" |
With parameters of interest the rank of
the ith parameter (when we set the ranking so that the largest
gets rank 1) is defined as
and the associated percentile is
The posterior expected percentile
for the ith unit (see e.g., Lin et. al. (2006)) is simply
the expected value of
given the data.
The function PostPercentile
computes an asymptotic version of the
posterior expected percentile, which is defined as
where has the same distribution as
and is
independent of both
and the data.
See Henderson and Newton (2014) for additional details.
A vector of estimated posterior expected percentiles.
Nicholas Henderson and Michael Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
Lin, R., Louis, T.A., Paddock, S.M., and Ridgeway, G. (2006). Loss function based ranking in two-stage, hierarchical models. Bayesian Analysis, 1, 915–946.
n <- 3000 theta <- rnorm(n, sd = 3) ses <- sqrt(rgamma(n, shape = 1, scale = 1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rv <- rvalues(dd, family = gaussian) perc <- PostPercentile(rv) plot(rv$rvalues, perc)
n <- 3000 theta <- rnorm(n, sd = 3) ses <- sqrt(rgamma(n, shape = 1, scale = 1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rv <- rvalues(dd, family = gaussian) perc <- PostPercentile(rv) plot(rv$rvalues, perc)
Computes r-values assuming that, for each parameter of interest, the user supplies a value for the posterior mean and the posterior standard deviation. The assumption here is that the posterior distributions are Normal.
PostSummaries(post.means, post.sds, hypers = NULL, qtheta = NULL, alpha.grid = NULL, ngrid = NULL, smooth = 0)
PostSummaries(post.means, post.sds, hypers = NULL, qtheta = NULL, alpha.grid = NULL, ngrid = NULL, smooth = 0)
post.means |
a vector of posterior means |
post.sds |
a vector of posterior standard deviations |
hypers |
a list with two elements: mean and sd. These represent the parameters in the (Normal) prior which was used to generate the posterior means and sds. If hypers is not supplied then one must supply the quantile function qtheta. |
qtheta |
a function which returns the quantiles (for upper tail probs.) of theta. If this is not supplied, the hyperparameter must be supplied. |
alpha.grid |
grid of values in (0,1); used for the discrete approximation approach for computing r-values. |
ngrid |
number of grid points for alpha.grid; only relevant when |
smooth |
either |
An object of class "rvals"
Nicholas Henderson and Michael Newton
n <- 500 theta <- rnorm(n) sig_sq <- rgamma(n,shape=1,scale=1) X <- theta + sqrt(sig_sq)*rnorm(n) pm <- X/(sig_sq + 1) psd <- sqrt(sig_sq/(sig_sq + 1)) rvs <- PostSummaries(post.means=pm,post.sds=psd,hypers=list(mean=0,sd=1)) hist(rvs$rvalues)
n <- 500 theta <- rnorm(n) sig_sq <- rgamma(n,shape=1,scale=1) X <- theta + sqrt(sig_sq)*rnorm(n) pm <- X/(sig_sq + 1) psd <- sqrt(sig_sq/(sig_sq + 1)) rvs <- PostSummaries(post.means=pm,post.sds=psd,hypers=list(mean=0,sd=1)) hist(rvs$rvalues)
Estimates a new prior for each bootstrap replications ... (need to add)
rvalueBoot(object, statistic = median, R, type = "nonparametric")
rvalueBoot(object, statistic = median, R, type = "nonparametric")
object |
An object of class "rvals" |
statistic |
The statistic used to summarize the bootstrap replicates. |
R |
Number of bootstrap replicates |
type |
Either |
When type="nonparametric"
, the prior is re-estimated (using the resampled data)
in each bootstrap replication, and r-values are re-computed with respect to this new model.
When type="parametric"
,
A list with the following two components
rval.repmat |
A matrix where each column corresponds to a separate bootstrap replication. |
rval.boot |
A vector of r-values obtained by applying the statistic to each row
of |
Nicholas Henderson and Michael Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
## Not run: n <- 3000 theta <- rnorm(n, sd = 3) ses <- sqrt(rgamma(n, shape = 10, rate = 1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rv <- rvalues(dd, family = gaussian, prior = "conjugate") rvb <- rvalueBoot(rv, R = 10) summary(rvb$rval.repmat[512,]) ## End(Not run)
## Not run: n <- 3000 theta <- rnorm(n, sd = 3) ses <- sqrt(rgamma(n, shape = 10, rate = 1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rv <- rvalues(dd, family = gaussian, prior = "conjugate") rvb <- rvalueBoot(rv, R = 10) summary(rvb$rval.repmat[512,]) ## End(Not run)
Given data on a collection of units, this function computes r-values which are percentiles constructed to maximize the agreement between the reported percentiles and the percentiles of the effect of interest. Additional details about r-values are provided below and can also be found in the listed references.
rvalues(data, family = gaussian, hypers = "estimate", prior = "conjugate", alpha.grid = NULL, ngrid = NULL, smooth = "none", control = list())
rvalues(data, family = gaussian, hypers = "estimate", prior = "conjugate", alpha.grid = NULL, ngrid = NULL, smooth = "none", control = list())
data |
A data frame or a matrix with the number of rows equal to the number of sampling units. The first column should contain the main estimates, and the second column should contain the nuisance terms. |
family |
An argument which determines the sampling distribution; this could be either
|
hypers |
values of the hyperparameters; only meaningful when the conjugate prior is used; if set to "estimate", the hyperparameters are found through maximum likelihood; if not set to "estimate" the user should supply a vector of length two. |
prior |
the form of the prior; either |
alpha.grid |
a numeric vector of points in (0,1); this grid is used in the discrete approximation of r-values |
ngrid |
number of grid points for alpha.grid; only relevant when |
smooth |
either |
control |
a list of control parameters for estimation of the prior; only used when the prior is nonparametric |
The r-value computation assumes the following two-level sampling model
~
and
~
, for
,
with parameters of interest
, effect size estimates
,
and nuisance terms
. The form of
is determined
by the
family
argument. When family = gaussian
, it is assumed that
~ N(
.
When
family = binomial
, the represent the number of successes
and number of trials respectively, and it is assumed that
~
Binomial
. When
family = poisson
, the should be
counts, and it is assumed that
~ Poisson(
.
The distribution of the effect sizes may be a parametric distribution
that is conjugate to the corresponding
family
argument,
or it may be estimated nonparametrically. When it is desired that be
parametric (i.e.,
prior = "conjugate"
), the default is to estimate the
hyperparameters (i.e., hypers = "estimate"
), but these may be supplied by the
user as a vector of length two. To estimate nonparametrically, one
should use
prior = "nonparametric"
(see npmle
for
further details about nonparametric estimation of ).
The r-value, , assigned to the ith case of interest is determined by
inf[
]
where
is the posterior probability that
exceeds the threshold
,
and
is the upper-
th quantile associated
with the marginal distribution of
(i.e.,
Similarly,
the threshold
is the upper-
th quantile of
(i.e.,
).
An object of class "rvals" which is a list containing at least the following components:
main |
a data frame containing the r-values, the r-value rankings along with the rankings from several other common procedures |
aux |
a list containing other extraneous information |
rvalues |
a vector of r-values |
Nicholas C. Henderson and Michael A. Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
rvaluesMCMC
, PostSummaries
, Valpha
## Not run: ### Binomial example with Beta prior: data(fluEnrich) flu.rvals <- rvalues(fluEnrich, family = binomial) hist(flu.rvals$rvalues) ### look at the r-values for indices 10 and 2484 fig_indices <- c(10,2484) fluEnrich[fig_indices,] flu.rvals$rvalues[fig_indices] ### Gaussian sampling distribution with nonparametric prior ### Use a maximum of 5 iterations for the nonparam. estimate data(hiv) hiv.rvals <- rvalues(hiv, prior = "nonparametric") ## End(Not run)
## Not run: ### Binomial example with Beta prior: data(fluEnrich) flu.rvals <- rvalues(fluEnrich, family = binomial) hist(flu.rvals$rvalues) ### look at the r-values for indices 10 and 2484 fig_indices <- c(10,2484) fluEnrich[fig_indices,] flu.rvals$rvalues[fig_indices] ### Gaussian sampling distribution with nonparametric prior ### Use a maximum of 5 iterations for the nonparam. estimate data(hiv) hiv.rvals <- rvalues(hiv, prior = "nonparametric") ## End(Not run)
Returns r-values from an array of MCMC output.
rvaluesMCMC(output, qtheta, alpha.grid = NULL, ngrid = NULL, smooth = "none")
rvaluesMCMC(output, qtheta, alpha.grid = NULL, ngrid = NULL, smooth = "none")
output |
a matrix contatining mcmc ouput. The ith row should represent a sample from the posterior of the ith parameter of interest. |
qtheta |
either a function which returns the quantiles (for upper tail probs.) of theta or a vector of theta-quantiles. |
alpha.grid |
grid of values in (0,1); used for the discrete approximation approach for computing r-values. |
ngrid |
number of grid points for alpha.grid; only relevant when |
smooth |
either |
An object of class "rvals" which is a list containing at least the following components:
main |
a data frame containing the r-values, the r-value rankings along with the rankings from several other common procedures |
aux |
a list containing other extraneous information |
rvalues |
a vector of r-values |
Nicholas Henderson and Michael Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
data(MCMCtest) ### For the MCMC output in MCMC_test, the prior assumed for the effect sizes of ### interest was a mixture of two t-distributions. The function qthetaTMix ### computes the quantiles for this prior. qthetaTMix <- function(p) { ### function to compute quantiles (for upper tail probabilities) for a ### mixture of two t-distributions mu <- c(.35,-.12) sig <- c(.2,.08) mix.prop <- c(.25,.75) ff <- function(x,pp) { prob_less <- 0 for(k in 1:2) { prob_less <- prob_less + pt((x - mu[k])/sig[k],df=4,lower.tail=FALSE)*mix.prop[k] } return(prob_less - pp) } nn <- length(p) ans <- numeric(nn) for(i in 1:nn) { ans[i] <- uniroot(ff,interval=c(-5,5),tol=1e-6,pp=p[i])$root } return(ans) } rvs <- rvaluesMCMC(MCMCtest, qtheta = qthetaTMix)
data(MCMCtest) ### For the MCMC output in MCMC_test, the prior assumed for the effect sizes of ### interest was a mixture of two t-distributions. The function qthetaTMix ### computes the quantiles for this prior. qthetaTMix <- function(p) { ### function to compute quantiles (for upper tail probabilities) for a ### mixture of two t-distributions mu <- c(.35,-.12) sig <- c(.2,.08) mix.prop <- c(.25,.75) ff <- function(x,pp) { prob_less <- 0 for(k in 1:2) { prob_less <- prob_less + pt((x - mu[k])/sig[k],df=4,lower.tail=FALSE)*mix.prop[k] } return(prob_less - pp) } nn <- length(p) ans <- numeric(nn) for(i in 1:nn) { ans[i] <- uniroot(ff,interval=c(-5,5),tol=1e-6,pp=p[i])$root } return(ans) } rvs <- rvaluesMCMC(MCMCtest, qtheta = qthetaTMix)
A t-distribution family object which allows one to specify
a t-density for the sampling distribution. Modeled after family
objects
often used in the glm function.
tdist(df)
tdist(df)
df |
vector containing the degrees of freedom |
An object of class "newfam"
, which is a list
containing the following components
family |
The family name |
df |
The degrees of freedom |
Nicholas Henderson and Michael Newton
a <- tdist(df=5)
a <- tdist(df=5)
Returns a list of the top units ranked according to "r-value" or another specified statistic.
TopList(object, topnum = 10, sorted.by = c("RValue","PostMean","MLE","PVal"))
TopList(object, topnum = 10, sorted.by = c("RValue","PostMean","MLE","PVal"))
object |
An object of class "rvals" |
topnum |
The length of the top list. |
sorted.by |
The statistic by which to sort; this could be |
a data frame with topnum
rows and columns containing the
r-value, mle, posterior mean, and p-value rankings.
Nicholas Henderson and Michael Newton
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) TopList(rvs, topnum = 12) TopList(rvs, topnum = 15, sorted.by = "MLE")
n <- 500 theta <- rnorm(n) ses <- sqrt(rgamma(n,shape=1,scale=1)) XX <- theta + ses*rnorm(n) dd <- cbind(XX,ses) rvs <- rvalues(dd, family = gaussian) TopList(rvs, topnum = 12) TopList(rvs, topnum = 15, sorted.by = "MLE")
Computes r-values directly from a "Valpha" matrix V where each column of Valpha contains posterior tail probabilities relative to a threshold indexed by alpha.
Valpha(V, alpha.grid, smooth = "none")
Valpha(V, alpha.grid, smooth = "none")
V |
a numeric vector with (i,j) entry: V[i,j] = P(theta_i >= theta[alpha_j]|data) |
alpha.grid |
grid of values in (0,1); used for the discrete approximation approach for computing r-values. |
smooth |
either |
A list with the following components
rvalues |
a vector of computed r-values |
Vmarginals |
The estimated V-marginals along the alpha grid points |
Vmarginals.smooth |
a function obtained through interpolation
and smoothing (if desired) the Vmarginals; i.e., an estimate of
|
Nicholas Henderson and Michael Newton
Henderson, N.C. and Newton, M.A. (2016). Making the cut: improved ranking and selection for large-scale inference. J. Royal Statist. Soc. B., 78(4), 781-804. doi:10.1111/rssb.12131 https://arxiv.org/abs/1312.5776
## Not run: data(fluEnrich) rvobj <- rvalues(fluEnrich, family = binomial) Vrvals <- Valpha(rvobj$aux$V, rvobj$aux$alpha.grid) ## End(Not run)
## Not run: data(fluEnrich) rvobj <- rvalues(fluEnrich, family = binomial) Vrvals <- Valpha(rvobj$aux$V, rvobj$aux$alpha.grid) ## End(Not run)