Title: | Joint Modelling of Repeated Measurements and Time-to-Event Data |
---|---|
Description: | Analysis of repeated measurements and time-to-event data via random effects joint models. Fits the joint models proposed by Henderson and colleagues <doi:10.1093/biostatistics/1.4.465> (single event time) and by Williamson and colleagues (2008) <doi:10.1002/sim.3451> (competing risks events time) to a single continuous repeated measure. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-varying covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by a latent Gaussian process. The model is estimated using am Expectation Maximization algorithm. Some plotting functions and the variogram are also included. This project is funded by the Medical Research Council (Grant numbers G0400615 and MR/M013227/1). |
Authors: | Pete Philipson [aut] , Ines Sousa [aut] , Peter J. Diggle [aut] , Paula Williamson [aut] , Ruwanthi Kolamunnage-Dona [aut] , Robin Henderson [aut], Graeme L. Hickey [aut, cre] , Maria Sudell [ctb], Medical Research Council [fnd] (Grant numbers: G0400615 and MR/M013227/1) |
Maintainer: | Graeme L. Hickey <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 1.2.8 |
Built: | 2024-11-02 03:58:34 UTC |
Source: | https://github.com/graemeleehickey/joiner |
This dataset describes a randomized clinical trial (Goldman et al., 1996) in which both survival and longitudinal data were collected to compare the efficacy and safety of two antiretroviral drugs, namely ddI (didanosine) and ddC (zalcitabine), in treating HIV-infected patients intolerant or failing zidovudine (AZT) therapy.
data(aids)
data(aids)
A data.frame
in the unbalanced format with 1405 longitudinal
observations from 467 subjects. The columns are:
id
integer: number for patient identification.
time
numeric: time to death (or censoring).
death
integer: event indicator. Coded as 0 =
right-censoring, and 1 =
death.
obstime
numeric: measurement times for the repeated CD4 count measurements.
CD4
numeric: CD4 cell counts measured at obstime
.
drug
factor: drug indicator. Coded as ddI =
didanosine
and ddC =
zalcitabine.
gender
factor: gender indicator. Coded as male
and
female
.
prevOI
factor: opportunistic infection indicator. Coded as
AIDS =
AIDS diagnosis at study entry, and noAIDS =
no
previous infection.
AZT
factor: AZT intolerance/failure indicator. Coded as
intolerance
or failure
.
Guo X, Carlin B. Separate and joint modeling of longitudinal and event time data using standard computer packages. The American Statistician. 2004; 58: 16-24
Goldman A, Carlin B, Crane L, Launer C, Korvick J, Deyton L, Abrams D. Response of CD4 and clinical consequences to treatment using ddI or ddC in patients with advanced HIV infection. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology. 1996; 11: 161-169 URL: http://www.biostat.umn.edu/~brad/data.html.
heart.valve
, epileptic
,
mental
, liver
.
The SANAD (Standard and New Anti-epileptic Drugs) study (Marson et al., 2007) is a randomized control trial of standard and new anti-epileptic drugs, comparing effects on longer term clinical outcomes. The data consists of longitudinal measurements of calibrated dose for the groups randomized to a standard drug (CBZ) and a new drug (LTG). The objective of the analysis is to investigate the effect of drug titration on the relative effects of LTG and CBZ on treatment failure (withdrawal of the randomized drug). There are several baseline covariates available, and also data on the time to withdrawal from randomized drug.
data(epileptic)
data(epileptic)
This is a data frame in the unbalanced format, that is, with one row per observation. The data consists of columns for patient identifier, time of measurement, calibrated dose, baseline covariates, and survival data. The column names are identified as follows:
id
integer: patient identifier.
dose
numeric: calibrated dose.
time
integer: timing of clinic visit at which dose recorded (days).
with.time
integer: time of drug withdrawal/maximum follow up time (days).
with.status
censoring indicator (1 =
withdrawal of
randomized drug and 0 =
not withdrawn from randomized drug/lost to
follow up).
with.status2
censoring indicator (1 =
withdrawal of
randomized drug due to inadequate seizure control, (2 =
withdrawal
of randomized drug due to unacceptable adverse effects, and 0 =
not
withdrawn from randomized drug/lost to follow up).
with.status.uae
1 =
withdrawal due to unacceptable
adverse effects, 0 =
otherwise.
with.status.isc
1 =
withdrawal due to inadequate
seizure control, 0 =
otherwise.
treat
factor: randomized treatment (CBZ or LTG).
age
numeric: age of patient at randomization (years).
gender
factor: gender of patient. F =
female, M
=
male.
learn.dis
factor: learning disability.
SANAD Trial Group, University of Liverpool
Marson AG, Appleton R, Baker GA, et al. A randomised controlled trial examining longer-term outcomes of standard versus new antiepileptic drugs. The SANAD Trial. Health Tech Assess. 2007; 11(37).
Marson AG, Al-Kharusi AM, Alwaidh M, et al. The SANAD study of effectiveness of carbamazepine, gabapentin, lamotrigine, oxcarbazepine, or topiramate for treatment of partial epilepsy: an unblinded randomised controlled trial. Lancet. 2007; 365: 2007-2013.
Williamson PR, Kolamunnage-Dona R, Philipson P, Marson AG. Joint modelling of longitudinal and competing risks data. Stats Med. 2008; 27(30): 6426-6438.
heart.valve
, liver
,
mental
, aids
.
This is longitudinal data on an observational study on detecting effects of different heart valves, differing on type of tissue, implanted in the aortic position. The data consists of longitudinal measurements on three different heart function outcomes, after surgery occurred. There are several baseline covariates available, and also survival data.
data(heart.valve)
data(heart.valve)
This is a data frame in the unbalanced format, that is, with one row per observation. The data consists in columns for patient identification, time of measurements, longitudinal multiple longitudinal measurements, baseline covariates, and survival data. The column names are identified as follows:
num
number for patient identification.
sex
gender of patient (0 =
Male and 1 =
Female).
age
age of patient at day of surgery (years).
time
observed time point, with surgery date as the time origin (years).
fuyrs
maximum follow up time, with surgery date as the time origin (years).
status
censoring indicator (1 =
died and 0 =
lost at follow up).
grad
valve gradient at follow-up visit.
log.grad
natural log transformation of grad
.
lvmi
left ventricular mass index (standardised) at follow-up visit.
log.lvmi
natural log transformation of lvmi
.
ef
ejection fraction at follow-up visit.
bsa
preoperative body surface area.
lvh
preoperative left ventricular hypertrophy.
prenyha
preoperative New York Heart Association (NYHA)
classification (1 =
I/II and 3 =
III/IV).
redo
previous cardiac surgery.
size
size of the valve (millimeters).
con.cabg
concomitant coronary artery bypass graft.
creat
preoperative serum creatinine (mol/mL).
dm
preoperative diabetes.
acei
preoperative use of ace inhibitor.
lv
preoperative left ventricular ejection fraction (LVEF)
(1 =
good, 2 =
moderate, and 3 =
poor).
emergenc
operative urgency (0 =
elective, 1 =
urgent, and 3 =
emergency).
hc
preoperative high cholesterol (0 =
absent, 1
=
present treated, and 2 =
present untreated).
sten.reg.mix
aortic valve haemodynamics (1 =
stenosis,
2 =
regurgitation, 3 =
mixed).
hs
implanted aortic prosthesis type (1 =
homograft
and 0 =
stentless porcine tissue).
Lim E, Ali A, Theodorou P, Sousa I, Ashrafian H, Chamageorgakis T, Duncan M, Diggle P, Pepper J. A longitudinal study of the profile and predictors of left ventricular mass regression after stentless aortic valve replacement. Ann Thorac Surg. 2008; 85(6): 2026-2029.
mental
, liver
, epileptic
,
aids
.
The joineR package implements methods for analyzing data from longitudinal studies in which the response from each subject consists of a time-sequence of repeated measurements and a possibly censored time-to-event outcome. The modelling framework for the repeated measurements is the linear model with random effects and/or correlated error structure (Laird and Ware, 1982). The model for the time-to-event outcome is a: Cox proportional hazards model with log-Gaussian frailty (Cox, 1972). A cause-specific hazards model is used when competing risks are present. Stochastic dependence is captured by allowing the Gaussian random effects of the linear model to be correlated with the frailty term of the Cox proportional hazards model. The methodology used to fit the model is described in Henderson et al. (2002) in the case of a single event time, and by Williamson et al. (2008) in the case of competing risks data. Both models exploit the general methodology proposed by Wulfsohn and Tsiatis (1997).
The package offers several types of functions for the analysis of joint data.
There are several functions, including jointdata
,
sample.jointdata
, subset.jointdata
, to.balanced
,
to.unbalanced
, and UniqueVariables
, which offer the ability
to construct a joint model dataset and manipulate it, e.g. take a sample
according to a baseline covariate or outcome.
The plot function can be applied to jointdata
and vargm
(variogram) objects. In addition, points
and lines
can also
be used with jointplot
objects.
The primary function for fitting a joint model is joint
. Standard
errors can be estimated using jointSE
.
Further details on the package are given in the vignette. To access
this, run vignette("joineR")
.
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1): 330-339.
Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1(4): 465-480.
Cox DR. Regression models and life-tables. J R Stat Soc Ser B Stat Methodol. 1972; 34(2): 187-220.
Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982; 38(4): 963-974.
Williamson PR, Kolamunnage-Dona R, Philipson P, Marson AG. Joint modelling of longitudinal and competing risks data. Stat Med. 2008; 27: 6426-6438.
This generic function fits a joint model with random latent association, building on the formulation described in Wulfsohn and Tsiatis (1997) while allowing for the presence of longitudinal and survival covariates, and three choices for the latent process. The link between the longitudinal and survival processes can be proportional or separate. When failure is attributable to 2 separate causes, a competing risks joint model is fitted as per Williamson et al. (2008).
joint( data, long.formula, surv.formula, model = c("intslope", "int", "quad"), sepassoc = FALSE, longsep = FALSE, survsep = FALSE, gpt, lgpt, max.it, tol, verbose = FALSE )
joint( data, long.formula, surv.formula, model = c("intslope", "int", "quad"), sepassoc = FALSE, longsep = FALSE, survsep = FALSE, gpt, lgpt, max.it, tol, verbose = FALSE )
data |
an object of class |
long.formula |
a formula object with the response variable, and the covariates to include in the longitudinal sub-model. |
surv.formula |
a formula object with the survival time, censoring
indicator and the covariates to include in the survival sub-model. The
response must be a survival object as returned by the
|
model |
a character string specifying the type of latent association.
This defaults to the intercept and slope version as seen in Wulfsohn and
Tsiatis (1997). For association via the random intercept only, choose
|
sepassoc |
logical value: if |
longsep |
logical value: if |
survsep |
if |
gpt |
the number of quadrature points across which the integration with
respect to the random effects will be performed. Defaults to |
lgpt |
the number of quadrature points which the log-likelihood is
evaluated over following a model fit. This defaults to |
max.it |
the maximum number of iterations of the EM algorithm that the
function will perform. Defaults to |
tol |
the tolerance level before convergence of the algorithm is deemed
to have occurred. Default value is |
verbose |
if |
The joint
function fits a joint model to survival and
longitudinal data. The formulation is similar to Wulfsohn and Tsiatis
(1997). A linear mixed effects model is assumed for the longitudinal data,
namely
where is a vector of random effects,
whose length depends on the model chosen, i.e.
for the
random intercept model.
is the random effects covariate matrix,
which will be time-dependent for all but the random intercept model.
is the longitudinal design matrix for unit
, and
is the vector of measurement times for subject
.
Measurement error is represented by
.
The Cox proportional hazards model is adopted for the survival data, namely
The parameter determines the level of association between the
two processes. For the intercept and slope model with separate association
we have
whereas under proportional association
is the vector of survival covariates for unit
. The
baseline hazard function is
.
The function uses an EM algorithm to estimate parameters in the joint
model. Starting values are provided by calls to standard R functions
lme
and coxph
for the
longitudinal and survival components, respectively.
A list containing the parameter estimates from the joint model and, if required, from either or both of the separate analyses. The combined log-likelihood from a separate analysis and the log-likelihood from the joint model are also produced as part of the fit.
If failure can be attributed to 2 causes, i.e. so-called competing risks events data, then a cause-specific hazards model is adopted, namely
where denotes the failure type,
(
) are cause-specific hazard parameters corresponding to the same
covariates, and
are cause-specific baseline hazard
functions. For this data, a proportional association structure is assumed
(i.e.
sepassoc = FALSE
) and a random-intercepts and random-slopes
model must be used (i.e. model = "intslope"
). Note that the function
only permits 2 failure types. The model is specified in full by Williamson
et al. (2008). The function joint()
automatically detects whether
competing risks are present by counting the number of unique components in
the event column on the event time data.
Both longsep
and survsep
ignore any
latent association (i.e. ) between the longitudinal and
survival processes but their output can be used to compare with the results
from the joint model. If interest is solely in the individual processes
then the user should instead make use of the functions
lme
and coxph
mentioned above.
Furthermore, if interest is in the separate effect of each random effect
(this is for intercept and slope or quadratic models only) upon the
survival data, the user should set sepassoc = TRUE
.
Since numerical integration is required, it is advisable to check the
stability of the maximum likelihood estimates with an increasing number of
Gauss-Hermite quadrature points. joint()
uses gpt = 3
by
default, as this has been adequate for many datasets. However, for certain
datasets and models, this might be too small.
Pete Philipson
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1): 330-339.
Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1(4): 465-480.
Williamson PR, Kolamunnage-Dona R, Philipson P, Marson AG. Joint modelling of longitudinal and competing risks data. Stat Med. 2008; 27: 6426-6438.
lme
, coxph
,
jointdata
, jointplot
.
## Standard joint model data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(data = heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs, status) ~ hs, model = "intslope") ## Competing risks joint model (same data as Williamson et al. 2008) ## Not run: data(epileptic) epileptic$interaction <- with(epileptic, time * (treat == "LTG")) longitudinal <- epileptic[, c(1:3, 13)] survival <- UniqueVariables(epileptic, c(4, 6), "id") baseline <- UniqueVariables(epileptic, "treat", "id") data <- jointdata(longitudinal = longitudinal, survival = survival, baseline = baseline, id.col = "id", time.col = "time") fit2 <- joint(data = data, long.formula = dose ~ time + treat + interaction, surv.formula = Surv(with.time, with.status2) ~ treat, longsep = FALSE, survsep = FALSE, gpt = 3) summary(fit2) ## End(Not run)
## Standard joint model data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(data = heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs, status) ~ hs, model = "intslope") ## Competing risks joint model (same data as Williamson et al. 2008) ## Not run: data(epileptic) epileptic$interaction <- with(epileptic, time * (treat == "LTG")) longitudinal <- epileptic[, c(1:3, 13)] survival <- UniqueVariables(epileptic, c(4, 6), "id") baseline <- UniqueVariables(epileptic, "treat", "id") data <- jointdata(longitudinal = longitudinal, survival = survival, baseline = baseline, id.col = "id", time.col = "time") fit2 <- joint(data = data, long.formula = dose ~ time + treat + interaction, surv.formula = Surv(with.time, with.status2) ~ treat, longsep = FALSE, survsep = FALSE, gpt = 3) summary(fit2) ## End(Not run)
joint
objectAn object returned by the joint
function, inheriting from
class joint
and representing a fitted joint model for longitudinal
and time-to-event (or competing risks) data.
joint.object
joint.object
An object of class NULL
of length 0.
A list with the following components.
coefficients
a list with the estimated coefficients. The components of this list are:
fixed
longitudinal and survival sub-model fixed effects.
random
the BLUPs of the random effects.
latent
the latent association parameter(s) from the time-to-event sub-model.
sigama.z
a numeric double for the residual standard error.
sigma.u
the variance-covariance matrix of the random effects.
hazard
a vector of the (centered) baseline hazards at each unique failure time.
log.lik
the log-likelihood from the joint model fit and sub-model contributions.
numIter
the number of EM algorithm iterations.
convergence
a logical value of whether convergence was achieved or not.
model
see joint
for details.
sepassoc
see joint
for details.
sepests
see joint
for details.
compRisk
a logical value indicating whether competing risks were detected or not.
sep.loglike
the log-likelihood from the joint model fit (with association set to zero) and separately fitted sub-model contributions.
formulae
a list of model formulae. See joint
for details.
data
call
the model call. Can be used by
update
.
Graeme L. Hickey
jointdata
This function creates an object of class jointdata
. This
is an object with information on at least one of, longitudinal data or
survival data. Moreover, it can also have data on baseline covariates.
jointdata( longitudinal = NA, survival = NA, baseline = NA, id.col = "ID", time.col = NA )
jointdata( longitudinal = NA, survival = NA, baseline = NA, id.col = "ID", time.col = NA )
longitudinal |
a data frame or matrix in the unbalanced format (one row
per observation), with subject identification, time of measurements, and
longitudinal measurements and/or time dependent covariates. This must be
given if no |
survival |
a data frame or matrix with survival data for all the
subjects. This must be given if no |
baseline |
a data frame or matrix with baseline covariates, or non-time
dependent covariates, for the same subjects as in |
id.col |
an element of class |
time.col |
an element of class |
This function creates an object of class jointdata
. This is a
list with elements used in joint modelling, mainly longitudinal and/or
survival data. The output has to have at least one of the data sets,
longitudinal or survival. However, for joint modelling is necessary to have
both data sets. Moreover, a third data frame is possible to be given as
input, for the baseline (non-time dependent) covariates. The subject
identification and time measurement column names are necessary.
A list of length six. The first element is the vector of subjects identification. The second is, if exists a data frame of the longitudinal data. The third element of the list is, if exists a data frame of the survival data. The fourth element of the list is, if exists a data frame on the baseline covariates. The fifth is, if longitudinal data is given, the column name identification of longitudinal times. And the sixth and last element of the list is the column name identification of subjects.
Ines Sousa
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time")
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time")
This function views the longitudinal profile of each unit with the last longitudinal measurement prior to event-time (censored or not) taken as the end-point, referred to as time zero. In doing so, the shape of the profile prior to event-time can be inspected. This can be done over a user-specified number of time units.
jointplot( object, Y.col, Cens.col, lag, split = TRUE, col1, col2, xlab, ylab, gp1lab, gp2lab, smooth = 2/3, mean.profile = FALSE, mcol1, mcol2 )
jointplot( object, Y.col, Cens.col, lag, split = TRUE, col1, col2, xlab, ylab, gp1lab, gp2lab, smooth = 2/3, mean.profile = FALSE, mcol1, mcol2 )
object |
an object of class |
Y.col |
an element of class |
Cens.col |
an element of class |
lag |
argument which specifies how many units in time we look back through. Defaults to the maximum observation time across all units. |
split |
logical argument which allows the profiles of units which
fail and those which are censored to be viewed in separate
panels of the same graph. This is the default option. Using |
col1 |
argument to choose the colour for the profiles of the censored units. |
col2 |
argument to choose the colour for the profiles of the failed units. |
xlab |
an element of class |
ylab |
an element of class |
gp1lab |
an element of class |
gp2lab |
an element of class |
smooth |
the smoother span. This gives the proportion of points in the
plot which influence the smooth at each value. Defaults to a value of 2/3.
Larger values give more smoothness. See |
mean.profile |
draw mean profiles if TRUE. Only applies to the
|
mcol1 |
argument to choose the colour for the mean profile of the units with a censoring indicator of zero. |
mcol2 |
argument to choose the colour for the mean profile of the units with a censoring indicator of one. |
The function tailors the xyplot
function to
produce a representation of joint data with longitudinal and survival
components.
A lattice plot.
If more than one cause of failure is present (i.e. competing risks data), then all failures are pooled together into a single failure type.
Pete Philipson
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1): 330-339.
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") jointplot(heart.valve.jd, Y.col = "log.lvmi", Cens.col = "status", lag = 5)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") jointplot(heart.valve.jd, Y.col = "log.lvmi", Cens.col = "status", lag = 5)
This function takes a model fit from a joint model and calculates standard errors, with optional confidence intervals, for the main longitudinal and survival covariates.
jointSE(fitted, n.boot, gpt, lgpt, max.it, tol, print.detail = FALSE)
jointSE(fitted, n.boot, gpt, lgpt, max.it, tol, print.detail = FALSE)
fitted |
a list containing as components the parameter estimates
obtained by fitting a joint model along with the respective formulae for
the longitudinal and survival sub-models and the model chosen, see
|
n.boot |
argument specifying the number of bootstrap samples to use in
order to obtain the standard error estimates and confidence intervals. Note
that at least |
gpt |
the number of quadrature points across which the integration with
respect to the random effects will be performed. Defaults to |
lgpt |
the number of quadrature points which the log-likelihood is
evaluated over following a model fit. This defaults to |
max.it |
the maximum number of iterations of the EM algorithm that the
function will perform. Defaults to |
tol |
the tolerance level before convergence of the algorithm is deemed
to have occurred. Default value is |
print.detail |
This argument determines the level of printing that is
done during the bootstrapping. If |
Standard errors and confidence intervals are obtained by repeated fitting of the requisite joint model to bootstrap samples of the original longitudinal and survival data. It is rare that more than 200 bootstrap samples are needed for estimating a standard error. The number of bootstrap samples needed for accurate confidence intervals can be as large as 1000.
An object of class data.frame
.
Ruwanthi Kolamunnage-Dona and Pete Philipson
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1): 330-339.
Efron B, Tibshirani R. An Introduction to the Bootstrap. 2000; Boca Raton, FL: Chapman & Hall/CRC.
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs, status) ~ hs, model = "int") jointSE(fitted = fit, n.boot = 1)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs, status) ~ hs, model = "int") jointSE(fitted = fit, n.boot = 1)
jointdata
plotAdd lines to an existing plot of an object of class
jointdata
, for a longitudinal variable. It is possible to plot all
the subjects in the data set, or just a selected subset
. See
subset.jointdata
.
## S3 method for class 'jointdata' lines(x, Y.col, ...)
## S3 method for class 'jointdata' lines(x, Y.col, ...)
x |
object of class |
Y.col |
column number, or column name, of longitudinal variable to be
plotted. Defaults to |
... |
other graphical arguments; see |
A graphical device with a plot for longitudinal data.
Ines Sousa
Other functions are useful to be used with this such as
plot
and points
.
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") # Randomly select a pair of subjects to plot profiles of take <- sample(1:max(heart.jd$survival$num), 2) heart.jd.1 <- subset(heart.jd, take[1]) heart.jd.2 <- subset(heart.jd, take[2]) plot(heart.jd.1, Y.col = 4) lines(heart.jd.2, Y.col = 4, lty = 2)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") # Randomly select a pair of subjects to plot profiles of take <- sample(1:max(heart.jd$survival$num), 2) heart.jd.1 <- subset(heart.jd, take[1]) heart.jd.2 <- subset(heart.jd, take[2]) plot(heart.jd.1, Y.col = 4) lines(heart.jd.2, Y.col = 4, lty = 2)
This dataset gives the longitudinal observations of prothrombin index, a measure of liver function, for patients from a controlled trial into prednisone treatment of liver cirrhosis. Time-to-event information in the form of the event time and associated censoring indicator are also recorded along with a solitary baseline covariate - the allocated treatment arm in this instance. The data are taken from Andersen et al. (1993, p. 19) and were analyzed in Henderson et al. (2002). This is a subset of the full data where a number of variables were recorded both at entry and during the course of the trial.
data(liver)
data(liver)
A data.frame
in the unbalanced format with longitudinal observations
from 488 subjects. The columns are:
id
integer: number for patient identification.
prothrombin
integer: prothrombin index measurement (%).
time
numeric: time of prothrombin index measurement (years).
treatment
integer: patient treatment indicator. Coded as
0 =
placebo; 1 =
prednisone.
survival
numeric: patient survival time (years).
cens
integer: censoring indicator. Coded as 1 =
died;
0 =
censored.
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC. 2004. URL: http://www.gllamm.org/books/readme.html#14.6.
Andersen PK, Borgan O, Gill RD, Kieding N. Statistical Models Based on Counting Processes. New York: Springer. 2003.
Henderson R, Diggle PJ, Dobson A. Identification and efficacy of longitudinal markers for survival. Biostatistics 2002; 3: 33-50.
heart.valve
, epileptic
,
mental
, aids
.
The data is obtained from a trial in which chronically ill mental health patients were randomized across two treatments: placebo and an active drug. A questionnaire instrument was used to assess each patient's mental state at weeks 0, 1, 2, 4, 6 and 8 post-randomisation, a high recorded score implying a severe condition. Some of the 100 patients dropped out of the study for reasons that were thought to be related to their mental state, and therefore potentially informative; others dropped out for reasons unrelated to their mental state.
data(mental)
data(mental)
A balanced data set with respect to the times at which observations recorded. The data consists of the following variables on each patient:
id
integer: patient identifier.
Y.t0
integer: mental state assessment in week 0. Coded
NA
if missing.
Y.t1
integer: mental state assessment in week 1. Coded
NA
if missing.
Y.t2
integer: mental state assessment in week 2. Coded
NA
if missing.
Y.t4
integer: mental state assessment in week 4. Coded
NA
if missing.
Y.t6
integer: mental state assessment in week 6. Coded
NA
if missing.
Y.t8
integer: mental state assessment in week 8. Coded
NA
if missing.
treat
integer: treatment allocation. Coded as 0 =
placebo; 1 =
active drug.
n.obs
integer: number of non-missing mental state assessments.
surv.time
numeric: imputed dropout time in weeks. Coded as
surv.time = 8.002
for completers.
cens.ind
integer: censoring indicator. Coded as 0 =
completer or non-informative dropout; 1 =
potentially informative
dropout.
Peter J. Diggle ([email protected])
Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1(4): 465-480.
Diggle PJ, Farewell D, Henderson R. Longitudinal data with dropout: objectives, assumptions and a proposal (with Discussion). Applied Statistics. 2007; 56: 499-550.
heart.valve
, liver
,
epileptic
.
Plot longitudinal data of an object of class jointdata
,
for a longitudinal variable. It is possible to plot all the subjects in the
data set, or just a selected subset
. See
subset.jointdata
.
## S3 method for class 'jointdata' plot(x, Y.col, type, xlab, xlim = NULL, ylim = NULL, main = NA, pty, ...)
## S3 method for class 'jointdata' plot(x, Y.col, type, xlab, xlim = NULL, ylim = NULL, main = NA, pty, ...)
x |
object of class |
Y.col |
column number, or column name, of longitudinal variable to be
plotted. Defaults to |
type |
the type of line to be plotted, see |
xlab |
a title for the x-axis, see |
xlim , ylim
|
numeric vectors of length 2, giving the x and y coordinates
ranges, see |
main |
an overall title for the plot; see |
pty |
a character specifying the type of plot region to be used, see
|
... |
other graphical arguments; see |
A graphical device with a plot for longitudinal data.
Ines Sousa
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") plot(heart.jd, Y.col = "grad", col = "grey")
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") plot(heart.jd, Y.col = "grad", col = "grey")
Plots the empirical variogram for observed measurements, of an
object of class vargm
, obtained by using function
variogram
.
## S3 method for class 'vargm' plot(x, smooth = FALSE, bdw = NULL, follow.time = NULL, points = TRUE, ...)
## S3 method for class 'vargm' plot(x, smooth = FALSE, bdw = NULL, follow.time = NULL, points = TRUE, ...)
x |
object of class |
smooth |
logical value to use a non-parametric estimator to calculate
the variogram of all |
bdw |
bandwidth to use in the time averages. The default is |
follow.time |
the interval of time we want to construct the variogram
for. When |
points |
logical value if the points |
... |
other graphical options as in |
A graphical device with the plot of empirical variogram.
Ines Sousa
data(mental) mental.unbalanced <- to.unbalanced(mental, id.col = 1, times = c(0, 1, 2, 4, 6, 8), Y.col = 2:7, other.col = c(8, 10, 11)) names(mental.unbalanced)[3] <- "Y" vgm <- variogram(indv = tail(mental.unbalanced[, 1], 30), time = tail(mental.unbalanced[, 2], 30), Y = tail(mental.unbalanced[, 3], 30)) plot(vgm)
data(mental) mental.unbalanced <- to.unbalanced(mental, id.col = 1, times = c(0, 1, 2, 4, 6, 8), Y.col = 2:7, other.col = c(8, 10, 11)) names(mental.unbalanced)[3] <- "Y" vgm <- variogram(indv = tail(mental.unbalanced[, 1], 30), time = tail(mental.unbalanced[, 2], 30), Y = tail(mental.unbalanced[, 3], 30)) plot(vgm)
jointdata
plotAdd points to an existing plot of an object of class
jointdata
, for a longitudinal variable. It is possible plot all the
subjects in the data set, or just a selected subset
. See
subset.jointdata
.
## S3 method for class 'jointdata' points(x, Y.col, ...)
## S3 method for class 'jointdata' points(x, Y.col, ...)
x |
object of class |
Y.col |
column number, or column name, of longitudinal variable to be
plotted. Defaults to |
... |
other graphical arguments; see |
A graphical device with a plot for longitudinal data. Other functions
are useful to be used with this as plot
and
lines
.
Ines Sousa
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") # Randomly select a pair of subjects to plot profiles of take <- sample(1 : max(heart.jd$survival$num), 2) heart.jd.1 <- subset(heart.jd, take[1]) heart.jd.2 <- subset(heart.jd, take[2]) plot(heart.jd.1, Y.col = "grad", type = "p") points(heart.jd.2, Y.col = "grad", col = "blue", pch = 20)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") # Randomly select a pair of subjects to plot profiles of take <- sample(1 : max(heart.jd$survival$num), 2) heart.jd.1 <- subset(heart.jd, take[1]) heart.jd.2 <- subset(heart.jd, take[2]) plot(heart.jd.1, Y.col = "grad", type = "p") points(heart.jd.2, Y.col = "grad", col = "blue", pch = 20)
jointdata
xGeneric function used to sampling a subset of data from an x of
class jointdata
, with a specific size of number of subjects.
sample.jointdata(x, size, replace = FALSE)
sample.jointdata(x, size, replace = FALSE)
x |
an object of class |
size |
number of subjects to include in the sampled subset. |
replace |
should sampling be with replacement? Default is |
An object of class jointdata
, with data only on the subjects
sampled.
Ines Sousa
sample
, jointdata
,
UniqueVariables
.
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time") sample.jointdata(heart.valve.jd, size = 10)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time") sample.jointdata(heart.valve.jd, size = 10)
This function simulates longitudinal and time-to-event data from a joint model.
simjoint( n = 500, model = c("intslope", "int", "quad"), sepassoc = FALSE, ntms = 5, b1 = c(1, 1, 1, 1), b2 = c(1, 1), gamma = c(1, 0.1), sigu, vare = 0.01, theta0 = -3, theta1 = 1, censoring = TRUE, censlam = exp(-3), truncation = FALSE, trunctime = max(ntms), gridstep = 0.01 )
simjoint( n = 500, model = c("intslope", "int", "quad"), sepassoc = FALSE, ntms = 5, b1 = c(1, 1, 1, 1), b2 = c(1, 1), gamma = c(1, 0.1), sigu, vare = 0.01, theta0 = -3, theta1 = 1, censoring = TRUE, censlam = exp(-3), truncation = FALSE, trunctime = max(ntms), gridstep = 0.01 )
n |
the number of subjects to simulate data for. |
model |
a character string specifying the type of latent association.
This defaults to the intercept and slope version as seen in Wulfsohn and
Tsiatis (1997). For association via the random intercept only, choose
|
sepassoc |
logical value: if |
ntms |
the maximum number of (discrete) time points to simulate repeated longitudinal measurements at. |
b1 |
a vector specifying the coefficients of the fixed effects in the longitudinal sub-model. The order in each row is intercept, a continuous covariate, covariate, a binary covariate, the measurement time. |
b2 |
a vector of |
gamma |
a vector of specifying the latent association parameter(s) for
the longitudinal outcome. It must be of length 1 if |
sigu |
a positive-definite matrix specifying the variance-covariance
matrix. If |
vare |
a numeric value specifying the residual standard error. |
theta0 , theta1
|
parameters controlling the failure rate. See Details. |
censoring |
logical: if |
censlam |
a scale ( |
truncation |
logical: if |
trunctime |
a truncation time for use when |
gridstep |
the step-size for the grid used to simulate event times when
|
The function simjoint
simulates data from a joint model,
similar to that performed in Henderson et al. (2000). It works by first
simulating longitudinal data for all possible follow-up times using random
draws for the multivariate Gaussian random effects and residual error
terms. Data can be simulated assuming either random-intercepts only
(model = "int"
) in each of the longitudinal sub-models;
random-intercepts and random-slopes (model = "intslope"
); or
quadratic random effects structures (model = "quad"
). The failure
times are simulated from proportional hazards time-to-event models, using
the following methodologies:
model = "int"
The baseline hazard function is specified to be an exponential distribution with
Simulation is conditional on known time-independent effects, and the methodology of Bender et al. (2005) is used to simulate the failure time.
model = "intslope"
The baseline hazard function is specified to be a Gompertz distribution with
In the usual representation of the Gompertz distribution, is
the shape parameter, and the scale parameter is equivalent to
. Simulation is conditional on on a predictable
(linear) time-varying process, and the methodology of Austin (2012) is used
to simulate the failure time.
model="quad"
The baseline hazard function is specified as per
model="intslope"
. The integration technique used for the above two
cases is complex in quadratic (and higher order) models, therefore we use a
different approach. We note that hazard function can be written as
In the simulation routine the parameter gridstep
acts as in
that we choose a nominally small value, which multiplies the hazard and
this scaled hazard is equivalent to the probability of having an event in
the interval
, or equivalently
gridstep
. A vector of possible times is set up for each
individual, ranging from 0 to
trunctime
in increments of (or
gridstep
). The failure probability at each time is compared to an
independent draw, and if the probability does not exceed the
random draw then the survival time is set as
trunctime
, otherwise it
is the generated time from the vector of candidate times. The minimum of
these candidate times (i.e. when we deem the event to have first happened)
is taken as the survival time.
A list of 2 data.frame
s: one recording the requisite
longitudinal outcomes data, and one recording the time-to-event data.
Pete Philipson
Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat Med. 2012; 31(29): 3946-3958.
Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005; 24: 1713-1723.
Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1(4): 465-480.
simjoint(10, sepassoc = TRUE)
simjoint(10, sepassoc = TRUE)
jointdata
Returns an object of class jointdata
which is a subset of
an original object of class jointdata
.
## S3 method for class 'jointdata' subset(x, subj.subset, ...)
## S3 method for class 'jointdata' subset(x, subj.subset, ...)
x |
an object of class |
subj.subset |
vector of subject identifiers, to include in the data subset. This must be a unique vector of patient identifiers. |
... |
further arguments to be passed to or from other methods. |
An object of class jointdata
, with data only on a subset of
subjects.
Ines Sousa
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") take <- heart.jd$survival$num[heart.jd$survival$status == 0] heart.jd.cens <- subset(heart.jd, take)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.long <- heart.valve[, c(1, 4, 5, 7, 8, 9, 10, 11)] heart.jd <- jointdata(longitudinal = heart.long, survival = heart.surv, id.col = "num", time.col = "time") take <- heart.jd$survival$num[heart.jd$survival$status == 0] heart.jd.cens <- subset(heart.jd, take)
Generic function used to produce summary information from a
fitted random effects joint model as represented by object
of class
joint
.
## S3 method for class 'joint' summary(object, variance = TRUE, ...)
## S3 method for class 'joint' summary(object, variance = TRUE, ...)
object |
an object of class |
variance |
should the variance components be output as variances or
standard deviations? Defaults to |
... |
further arguments for the summary. |
An object inheriting from class summary.joint
with all
components included in object
(see joint
for a full
description of the components) plus the following components:
nobs |
the total number of (typically longitudinal) observations (i.e. rows in an unbalanced data set). |
ngrps |
the number of groups in the analyzed dataset, often individual subjects. |
Pete Philipson
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs","status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(data = heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs,status) ~ hs, model = "intslope") summary(fit)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs","status"), id.col = "num") heart.long <- heart.valve[, c("num", "time", "log.lvmi")] heart.cov <- UniqueVariables(heart.valve, c("age", "hs", "sex"), id.col = "num") heart.valve.jd <- jointdata(longitudinal = heart.long, baseline = heart.cov, survival = heart.surv, id.col = "num", time.col = "time") fit <- joint(data = heart.valve.jd, long.formula = log.lvmi ~ 1 + time + hs, surv.formula = Surv(fuyrs,status) ~ hs, model = "intslope") summary(fit)
jointdata
objectGeneric function used to produce summaries of objects of class
jointdata
.
## S3 method for class 'jointdata' summary(object, ...)
## S3 method for class 'jointdata' summary(object, ...)
object |
an object of class |
... |
further arguments for the summary. |
A list with five elements. Each summarises an element of the
jointdata
object:
subjects |
Gives the number of subjects in the data set. |
longitudinal |
If longitudinal data is available, it gives the names and class, of the longitudinal variables. |
survival |
If survival data is available, it gives the number of subjects with failure and censored survival times. |
baseline |
If baseline covariates is available, it gives the names and class, of the baseline covariates. |
times |
If longitudinal data is available, it gives the unique longitudinal time measurements, if it is a balanced study. In case of unbalanced study, it will only state it is an unbalanced study. |
Ines Sousa
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time") summary(heart.valve.jd)
data(heart.valve) heart.surv <- UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num") heart.valve.jd <- jointdata(survival = heart.surv, id.col = "num", time.col = "time") summary(heart.valve.jd)
For a balanced longitudinal data set a vector of the mean response and variances at defined time points is returned along with the correlation matrix of the responses across the time points.
summarybal(object, Y.col, times, use = "all.obs", na.rm, ...)
summarybal(object, Y.col, times, use = "all.obs", na.rm, ...)
object |
a longitudinal data set in the balanced format. |
Y.col |
the column numbers of the longitudinal measurements at each
design time point in the |
times |
a vector of unique time points of the longitudinal measurements.
This does not have to be all of the study time points and may be a subset
instead, but should match the columns defined in |
use |
an optional character string giving a method for computing
covariances in the presence of missing values. This must be (an
abbreviation of) one of the strings |
na.rm |
logical. Should missing values be removed? By default,
|
... |
further arguments for the summary. |
A list
with three elements:
mean.vect |
a matrix with the time points in the first column and the mean response vector as the second column. |
variance |
The vector of variances for the response at the time points. |
cor.mtx |
Containing the correlation matrix of the responses between each pair of time points. |
Ines Sousa
data(mental) summarybal(mental, Y.col = 2:7, times = c(0, 1, 2, 4, 6, 8), na.rm = TRUE)
data(mental) summarybal(mental, Y.col = 2:7, times = c(0, 1, 2, 4, 6, 8), na.rm = TRUE)
Transforms a longitudinal data set in the unbalanced format to the balanced format.
to.balanced(data, id.col, time.col, Y.col, other.col = NA)
to.balanced(data, id.col, time.col, Y.col, other.col = NA)
data |
a data frame with longitudinal data in the unbalanced format. That is, in the format of 'one row per observation'. |
id.col |
a column number, or column name, in the data frame |
time.col |
a column number, or column name, in the data frame
|
Y.col |
a vector of column numbers, or column names, of longitudinal
variables, and/or time dependent covariates in the data frame |
other.col |
a vector of column numbers, or column names, of baseline
covariates, and/or other subject level data, as for example, survival data.
Default does not include |
A data frame with longitudinal data in the balanced format. The balanced format is considered in this context as the format where each row has data on each subject. Notice that in this format we will have multiple columns for the same longitudinal variable, each corresponding to the variable observed at each time point.
Ines Sousa
simul <- data.frame(num = 1:10, Y1.1 = rnorm(10), Y1.2 = rnorm(10), Y2.1 = rnorm(10), Y2.2 = rnorm(10), age = rnorm(10)) simul <- to.unbalanced(simul, id.col = 1, times = c(1, 2), Y.col = 2:5, other.col = 6) simul <- to.balanced(simul, id.col = "num", time.col = "time", Y.col = c("Y1.1", "Y2.1"), other.col = "age")
simul <- data.frame(num = 1:10, Y1.1 = rnorm(10), Y1.2 = rnorm(10), Y2.1 = rnorm(10), Y2.2 = rnorm(10), age = rnorm(10)) simul <- to.unbalanced(simul, id.col = 1, times = c(1, 2), Y.col = 2:5, other.col = 6) simul <- to.balanced(simul, id.col = "num", time.col = "time", Y.col = c("Y1.1", "Y2.1"), other.col = "age")
Transforms a longitudinal data set in the balanced format to the unbalanced format.
to.unbalanced(data, id.col, times, Y.col, other.col = NA)
to.unbalanced(data, id.col, times, Y.col, other.col = NA)
data |
a data frame with longitudinal data in the balanced format. That
is, in the format of 'one row per subject'. |
id.col |
a column number, or column name, in the data frame |
times |
a vector with the unique time points where the patients are observed. This is the study design time points in a balanced data set. |
Y.col |
a vector of column numbers, or column names, of longitudinal
variables, and/or time dependent covariates in the data frame |
other.col |
a vector of column numbers, or column names, of baseline
covariates, and/or other subject level data, as for example, survival data.
Default does not include |
A data frame with longitudinal data in the unbalanced format. The unbalanced format is considered in this context as the format where each row has data on each subject observation.
Ines Sousa
simul <- data.frame(num = 1:10, Y1.1 = rnorm(10), Y1.2 = rnorm(10), Y2.1 = rnorm(10), Y2.2 = rnorm(10), age = rnorm(10)) to.unbalanced(simul, id.col = 1, times = c(1, 2), Y.col = 2:5, other.col = 6)
simul <- data.frame(num = 1:10, Y1.1 = rnorm(10), Y1.2 = rnorm(10), Y2.1 = rnorm(10), Y2.2 = rnorm(10), age = rnorm(10)) to.unbalanced(simul, id.col = 1, times = c(1, 2), Y.col = 2:5, other.col = 6)
This function extracts a set of unique variables within a patient, returning a data frame with columns, patient identification and variables selected. Each row corresponds to the data for each individual.
UniqueVariables(data, var.col, id.col = "ID")
UniqueVariables(data, var.col, id.col = "ID")
data |
data frame, or matrix, with at least a column of patient identification and a covariate column. |
var.col |
vector of column names or column numbers, of the variables (non-time dependent). Cannot have mix of numbers and column names. |
id.col |
column name or column number of the patient identification. |
This function can be used, when longitudinal data is in the unbalanced format, and it is necessary, for example, to extract the set of unique baseline covariates, or any non-time dependent variables, that in the unbalanced format, are repeated for each observation row. Also, if the original data frame has survival data, this can also be used to extract the survival information from the original data set.
A data frame with patient identification and covariates selected. Each row corresponds to the data for each individual. Note that, this can be only used for non-time dependent covariates. If extracting unique time dependent covariates, the function gives an error, because it can't select what is the unique covariate.
Ines Sousa
data(heart.valve) heart.cov <- UniqueVariables(heart.valve, c(2, 3, 5, 6, 12:25), id.col = "num")
data(heart.valve) heart.cov <- UniqueVariables(heart.valve, c(2, 3, 5, 6, 12:25), id.col = "num")
Calculates the variogram for observed measurements, with two components, the total variability in the data, and the variogram for all time lags in all individuals.
variogram(indv, time, Y)
variogram(indv, time, Y)
indv |
vector of individual identification, as in the longitudinal data, repeated for each time point. |
time |
vector of observation time, as in the longitudinal data. |
Y |
vector of observed measurements. This can be a vector of longitudinal data, or residuals after fitting a model for the mean response. |
The empirical variogram in this function is calculated from observed
half-squared-differences between pairs of measurements, and the corresponding time differences
. The variogram is plotted for averages of each time
lag for the
for all
.
An object of class vargm
and list
with two elements.
The first svar
is a matrix with columns for all values
, and the second
sigma2
is the total variability
in the data.
There is a function plot.vargm
which should be used to
plot the empirical variogram.
Ines Sousa
data(mental) mental.unbalanced <- to.unbalanced(mental, id.col = 1, times = c(0, 1, 2, 4, 6, 8), Y.col = 2:7, other.col = c(8, 10, 11)) names(mental.unbalanced)[3] <- "Y" vgm <- variogram(indv = tail(mental.unbalanced[, 1], 30), time = tail(mental.unbalanced[, 2], 30), Y = tail(mental.unbalanced[, 3], 30))
data(mental) mental.unbalanced <- to.unbalanced(mental, id.col = 1, times = c(0, 1, 2, 4, 6, 8), Y.col = 2:7, other.col = c(8, 10, 11)) names(mental.unbalanced)[3] <- "Y" vgm <- variogram(indv = tail(mental.unbalanced[, 1], 30), time = tail(mental.unbalanced[, 2], 30), Y = tail(mental.unbalanced[, 3], 30))