Estimate GMCM parameters of the general model

Estimates the parameters of general Gaussian mixture copula models (GMCM). The function finds the maximum likelihood estimate of a general GMCM with various optimization procedures. Note, all but the PEM methods provides the maximum likelihood estimate.

fit.full.GMCM(u, m, theta = choose.theta(u, m), method = c("NM",
  "SANN", "L-BFGS", "L-BFGS-B", "PEM"), max.ite = 1000, verbose = TRUE,
  ...)

fit.general.GMCM(u, m, theta = choose.theta(u, m), method = c("NM",
  "SANN", "L-BFGS", "L-BFGS-B", "PEM"), max.ite = 1000, verbose = TRUE,
  ...)

Arguments

u	An `n` by `d` matrix of marginally uniform observations. Rows corresponds to observations and columns to the dimensions of the variables. I.e. these are often ranked and scaled test statistics or other observations.
m	The number of components to be fitted.
theta	A list of parameters as defined in `rtheta`. If `theta` is not provided, then heuristic starting values are chosen using the k-means algorithm.
method	A character vector of length \(1\). The optimization method used. Should be either `"NM"`, `"SANN"`, `"L-BFGS"`, `"L-BFGS-B"`, or `"PEM"` which are the Nelder-Mead, Simulated Annealing, limited-memory quasi-Newton method, limited-memory quasi-Newton method with box constraints, and the pseudo EM algorithm, respectively. Default is `"NM"`. See `optim` for further details.
max.ite	The maximum number of iterations. If the `method` is `"SANN"` this is the number of iterations as there is no other stopping criterion. (See `optim`)
verbose	Logical. If `TRUE`, a trace of the parameter estimates is made.
...	Arguments passed to the `control`-list in `optim` when `method` is not equal to `"PEM"`. If `method` equals `"PEM"`, the arguments are passed to `PseudoEMAlgorithm` if the `method`.

Value

A list of parameters formatted as described in rtheta.

When method equals "PEM", a list of extra information (log-likelihood trace, the matrix of group probabilities, theta trace) is added as an attribute called "extra".

Details

The "L-BFGS-B" method does not perform a transformation of the parameters and uses box constraints as implemented in optim.
Note that the many parameter configurations are poorly estimable or directly unidentifiable.

fit.general.GMCM is simply an alias of fit.full.gmcm.

Note

All the optimization procedures are strongly dependent on the initial values and other parameters (such as the cooling scheme for method SANN). Therefore it is advisable to apply multiple different initial parameters (and optimization routines) and select the best fit.

The choose.theta itself chooses random a initialization. Hence, the output when theta is not directly supplied can vary.

See optim for further details.

References

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

Tewari, A., Giering, M. J., & Raghunathan, A. (2011). Parametric Characterization of Multimodal Distributions with Non-gaussian Modes. 2011 IEEE 11th International Conference on Data Mining Workshops, 286-292. doi:10.1109/ICDMW.2011.135

Author

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

Examples

set.seed(17)
sim <- SimulateGMCMData(n = 1000, m = 3, d = 2)

# Plotting simulated data
par(mfrow = c(1,2))
plot(sim$z, col = rainbow(3)[sim$K], main = "Latent process")
plot(sim$u, col = rainbow(3)[sim$K], main = "GMCM process")

# Observed data
uhat <- Uhat(sim$u)

# The model should be fitted multiple times using different starting estimates
start.theta <- choose.theta(uhat, m = 3)  # Random starting estimate
res <- fit.full.GMCM(u = uhat, theta = start.theta,
                     method = "NM", max.ite = 3000,
                     reltol = 1e-2, trace = TRUE)  # Note, 1e-2 is too big
#>   Nelder-Mead direct search function minimizer
#> function value for initial parameters = -545.989357
#>   Scaled convergence tolerance is 5.45999
#> Stepsize computed as 0.622874
#> BUILD             15 -181.364917 -607.913043
#> HI-REDUCTION      17 -290.414492 -607.913043
#> LO-REDUCTION      19 -381.286753 -607.913043
#> LO-REDUCTION      21 -416.339626 -607.913043
#> LO-REDUCTION      23 -443.287997 -607.913043
#> LO-REDUCTION      25 -452.988730 -607.913043
#> LO-REDUCTION      27 -473.876690 -607.913043
#> HI-REDUCTION      29 -478.668822 -607.913043
#> LO-REDUCTION      31 -488.164776 -607.913043
#> LO-REDUCTION      33 -493.779249 -607.913043
#> LO-REDUCTION      35 -497.273411 -607.913043
#> EXTENSION         37 -506.928050 -637.031759
#> LO-REDUCTION      39 -512.824130 -637.031759
#> LO-REDUCTION      41 -540.798755 -637.031759
#> LO-REDUCTION      43 -545.989357 -637.031759
#> EXTENSION         45 -546.904877 -656.778752
#> LO-REDUCTION      47 -554.999391 -656.778752
#> LO-REDUCTION      49 -556.873829 -656.778752
#> LO-REDUCTION      51 -557.207070 -656.778752
#> LO-REDUCTION      53 -560.222614 -656.778752
#> LO-REDUCTION      55 -563.146559 -656.778752
#> LO-REDUCTION      57 -568.809552 -656.778752
#> LO-REDUCTION      59 -568.822527 -656.778752
#> LO-REDUCTION      61 -575.304346 -657.595173
#> REFLECTION        63 -607.913043 -668.890671
#> LO-REDUCTION      65 -612.881381 -668.890671
#> LO-REDUCTION      67 -616.920288 -668.890671
#> LO-REDUCTION      69 -623.616514 -668.890671
#> LO-REDUCTION      71 -625.542742 -668.890671
#> REFLECTION        73 -627.715548 -672.386340
#> LO-REDUCTION      75 -632.063552 -672.386340
#> LO-REDUCTION      77 -635.169773 -672.386340
#> REFLECTION        79 -637.031759 -673.621224
#> LO-REDUCTION      81 -638.663225 -673.621224
#> REFLECTION        83 -640.500103 -673.697419
#> LO-REDUCTION      85 -645.060565 -673.697419
#> HI-REDUCTION      87 -652.511509 -673.697419
#> REFLECTION        89 -656.404679 -676.802339
#> LO-REDUCTION      91 -656.436660 -676.802339
#> LO-REDUCTION      93 -656.778752 -676.802339
#> LO-REDUCTION      95 -657.595173 -676.802339
#> LO-REDUCTION      97 -659.660819 -676.802339
#> LO-REDUCTION      99 -660.496546 -676.802339
#> HI-REDUCTION     101 -661.984331 -676.802339
#> HI-REDUCTION     103 -665.156366 -676.802339
#> EXTENSION        105 -667.767870 -684.291294
#> LO-REDUCTION     107 -668.620458 -684.291294
#> HI-REDUCTION     109 -668.890671 -684.291294
#> LO-REDUCTION     111 -671.932296 -684.291294
#> LO-REDUCTION     113 -671.954567 -684.291294
#> LO-REDUCTION     115 -672.386340 -684.291294
#> LO-REDUCTION     117 -673.621224 -684.291294
#> LO-REDUCTION     119 -673.697419 -684.291294
#> LO-REDUCTION     121 -674.174319 -684.291294
#> LO-REDUCTION     123 -674.973471 -684.291294
#> LO-REDUCTION     125 -675.998553 -684.291294
#> LO-REDUCTION     127 -676.128724 -684.291294
#> EXTENSION        129 -676.519877 -686.571474
#> LO-REDUCTION     131 -676.520928 -686.571474
#> LO-REDUCTION     133 -676.802339 -686.571474
#> REFLECTION       135 -677.412806 -687.371632
#> LO-REDUCTION     137 -678.362967 -687.371632
#> LO-REDUCTION     139 -678.883261 -687.371632
#> EXTENSION        141 -679.465947 -691.973270
#> LO-REDUCTION     143 -679.504612 -691.973270
#> LO-REDUCTION     145 -680.614256 -691.973270
#> EXTENSION        147 -680.914675 -694.673666
#> EXTENSION        149 -681.813032 -699.831936
#> LO-REDUCTION     151 -683.710055 -699.831936
#> LO-REDUCTION     153 -683.927285 -699.831936
#> LO-REDUCTION     155 -683.991374 -699.831936
#> LO-REDUCTION     157 -684.193487 -699.831936
#> LO-REDUCTION     159 -684.291294 -699.831936
#> LO-REDUCTION     161 -686.036691 -699.831936
#> EXTENSION        163 -686.136179 -702.916072
#> LO-REDUCTION     165 -686.571474 -702.916072
#> LO-REDUCTION     167 -687.037323 -702.916072
#> EXTENSION        169 -687.371632 -710.072718
#> LO-REDUCTION     171 -690.086754 -710.072718
#> LO-REDUCTION     173 -691.282247 -710.072718
#> LO-REDUCTION     175 -691.973270 -710.072718
#> LO-REDUCTION     177 -692.700564 -710.072718
#> LO-REDUCTION     179 -694.673666 -710.072718
#> LO-REDUCTION     181 -695.223208 -710.072718
#> LO-REDUCTION     183 -695.824834 -710.072718
#> LO-REDUCTION     185 -695.943653 -710.072718
#> EXTENSION        187 -696.305622 -710.735228
#> HI-REDUCTION     189 -697.608363 -710.735228
#> LO-REDUCTION     191 -698.479768 -710.735228
#> LO-REDUCTION     193 -699.831936 -710.735228
#> LO-REDUCTION     195 -700.454431 -710.735228
#> REFLECTION       197 -702.028381 -711.106772
#> LO-REDUCTION     199 -702.214078 -711.106772
#> EXTENSION        201 -702.504380 -713.481754
#> REFLECTION       203 -702.893339 -713.908167
#> LO-REDUCTION     205 -702.916072 -713.908167
#> EXTENSION        207 -705.647560 -721.285516
#> LO-REDUCTION     209 -706.286356 -721.285516
#> LO-REDUCTION     211 -707.503505 -721.285516
#> LO-REDUCTION     213 -708.393078 -721.285516
#> LO-REDUCTION     215 -708.465580 -721.285516
#> LO-REDUCTION     217 -708.513191 -721.285516
#> HI-REDUCTION     219 -709.097685 -721.285516
#> LO-REDUCTION     221 -709.543836 -721.285516
#> LO-REDUCTION     223 -710.072718 -721.285516
#> LO-REDUCTION     225 -710.735228 -721.285516
#> LO-REDUCTION     227 -711.106772 -721.285516
#> EXTENSION        229 -711.220997 -721.500131
#> EXTENSION        231 -713.200793 -727.220534
#> LO-REDUCTION     233 -713.481754 -727.220534
#> LO-REDUCTION     235 -713.908167 -727.220534
#> LO-REDUCTION     237 -714.522440 -727.220534
#> LO-REDUCTION     239 -716.005669 -727.220534
#> LO-REDUCTION     241 -716.469917 -727.220534
#> LO-REDUCTION     243 -717.046978 -727.220534
#> LO-REDUCTION     245 -717.144789 -727.220534
#> LO-REDUCTION     247 -717.768371 -727.220534
#> LO-REDUCTION     249 -719.541861 -727.220534
#> LO-REDUCTION     251 -720.028782 -727.220534
#> LO-REDUCTION     253 -720.156059 -727.220534
#> LO-REDUCTION     255 -720.708474 -727.220534
#> EXTENSION        257 -721.285516 -728.985618
#> LO-REDUCTION     259 -721.500131 -728.985618
#> LO-REDUCTION     261 -721.666801 -728.985618
#> LO-REDUCTION     263 -721.926602 -728.985618
#> EXTENSION        265 -723.589752 -732.828018
#> LO-REDUCTION     267 -724.204066 -732.828018
#> LO-REDUCTION     269 -725.226211 -732.828018
#> LO-REDUCTION     271 -725.235945 -732.828018
#> LO-REDUCTION     273 -725.416924 -732.828018
#> LO-REDUCTION     275 -725.974702 -732.828018
#> LO-REDUCTION     277 -726.114174 -732.828018
#> LO-REDUCTION     279 -726.903048 -732.828018
#> LO-REDUCTION     281 -726.911094 -732.828018
#> LO-REDUCTION     283 -727.220534 -732.828018
#> LO-REDUCTION     285 -727.242634 -732.828018
#> Exiting from Nelder Mead minimizer
#>     287 function evaluations used

# Confusion matrix
Khat <- apply(get.prob(uhat, theta = res), 1, which.max)
table("Khat" = Khat, "K" = sim$K)  # Note, some components have been swapped
#>     K
#> Khat   1   2   3
#>    1  97 121   0
#>    2   0 489   0
#>    3   0   0 293

# Simulation from GMCM with the fitted parameters
simfit <- SimulateGMCMData(n = 1000, theta = res)

# As seen, the underlying latent process is hard to estimate.
# The clustering, however, is very good.
par(mfrow = c(2,2))
plot(simfit$z, col = simfit$K, main = "Model check 1\nSimulated GMM")
plot(simfit$u, col = simfit$K, main = "Model check 2\nSimulated GMCM")
plot(sim$u, col = Khat, main = "MAP clustering")