Get random parameters for the Gaussian mixture (copula) model

Generate a random set parameters for the Gaussian mixture model (GMM) and Gaussian mixture copula model (GMCM). Primarily, it provides an easy prototype of the theta-format used in GMCM.

rtheta(m = 3, d = 2, method = c("old", "EqualSpherical",
  "UnequalSpherical", "EqualEllipsoidal", "UnequalEllipsoidal"))

Arguments

m	The number of components in the mixture.
d	The dimension of the mixture distribution.
method	The method by which the theta should be generated. See details. Defaults to `"old"` which is the regular "old" behavior.

Value

A named list of parameters with the 4 elements:

m

An integer giving the number of components in the mixture. Default is 3.

d

An integer giving the dimension of the mixture distribution. Default is 2.

pie

A numeric vector of length m of mixture proportions between 0 and 1 which sums to one.

mu

A list of length m of numeric vectors of length d for each component.

sigma

A list of length m of variance-covariance matrices (of size d times d) for each component.

Details

Depending on the method argument the parameters are generated as follows. The new behavior is inspired by the simulation scenarios in Friedman (1989) but not exactly the same.

pie is generated by \(m\) draws of a chi-squared distribution with \(3m\) degrees of freedom divided by their sum. If method = "old" the uniform distribution is used instead.
mu is generated by \(m\) i.i.d. \(d\)-dimensional zero-mean normal vectors with covariance matrix 100I. (unchanged from the old behavior)
sigma is dependent on method. The covariance matrices for each component are generated as follows. If the method is
- "EqualSpherical", then the covariance matrices are the identity matrix and thus are all equal and spherical.
- "UnequalSpherical", then the covariance matrices are scaled identity matrices. In component \(h\), the covariance matrix is \(hI\)
- "EqualEllipsoidal", then highly elliptical covariance matrices which equal for all components are used. The square root of the \(d\) eigenvalues are chosen equidistantly on the interval \(10\) to \(1\) and a randomly (uniformly) oriented orthonormal basis is chosen and used for all components.
- "UnqualEllipsoidal", then highly elliptical covariance matrices different for all components are used. The eigenvalues of the covariance matrices equal as in all components as in "EqualEllipsoidal". However, they are all randomly (uniformly) oriented (unlike as described in Friedman (1989)).
- "old", then the old behavior is used. The old behavior differs from "EqualEllipsoidal" by using the absolute value of \(d\) zero-mean i.i.d. normal eigenvalues with a standard deviation of 8.
In all cases, the orientation is selected uniformly.

Note

The function is.theta checks whether or not theta is in the correct format.

References

Friedman, Jerome H. "Regularized discriminant analysis." Journal of the American statistical association 84.405 (1989): 165-175.

Author

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

Examples

rtheta()
#> theta object with d = 2 dimensions and m = 3 components:
#> 
#> $pie
#>       pie1       pie2       pie3 
#> 0.58534163 0.02938762 0.38527075 
#> 
#> $mu
#> $mu$comp1
#> [1] -7.535104 12.801516
#> 
#> $mu$comp2
#> [1] -9.52905 16.22379
#> 
#> $mu$comp3
#> [1] 26.001420  1.396485
#> 
#> 
#> $sigma
#> $sigma$comp1
#>            [,1]       [,2]
#> [1,] 10.8007448 -0.1486676
#> [2,] -0.1486676  6.3964607
#> 
#> $sigma$comp2
#>           [,1]      [,2]
#> [1,]  8.468842 -1.117277
#> [2,] -1.117277  7.976768
#> 
#> $sigma$comp3
#>          [,1]     [,2]
#> [1,] 6.528148 3.371957
#> [2,] 3.371957 6.616887
#> 
#> 

rtheta(d = 5, m = 2)
#> theta object with d = 5 dimensions and m = 2 components:
#> 
#> $pie
#>      pie1      pie2 
#> 0.4563865 0.5436135 
#> 
#> $mu
#> $mu$comp1
#> [1]  -1.040391   7.329730   4.556796   2.880795 -10.736909
#> 
#> $mu$comp2
#> [1]  6.487425  2.991623 -7.959950 -0.293534 21.802357
#> 
#> 
#> $sigma
#> $sigma$comp1
#>             [,1]       [,2]        [,3]       [,4]       [,5]
#> [1,]  3.58303151  2.2724500 -0.06694815  1.1431379 -1.8603162
#> [2,]  2.27245000  4.2616259 -0.81793510  0.9069594 -1.0912561
#> [3,] -0.06694815 -0.8179351  2.82135557  0.3351446  0.5866486
#> [4,]  1.14313788  0.9069594  0.33514464  2.7752424 -0.1468975
#> [5,] -1.86031619 -1.0912561  0.58664865 -0.1468975  2.6438174
#> 
#> $sigma$comp2
#>            [,1]       [,2]      [,3]       [,4]      [,5]
#> [1,]  6.6378803 -5.1945442  1.661071 -0.3862394  2.086088
#> [2,] -5.1945442  8.6296959 -1.810793  0.3705506  1.488922
#> [3,]  1.6610707 -1.8107928  3.504766 -1.3885595 -2.540656
#> [4,] -0.3862394  0.3705506 -1.388560 12.2751912 -1.928562
#> [5,]  2.0860880  1.4889221 -2.540656 -1.9285615  7.046797
#> 
#> 

rtheta(d = 3, m = 2, method = "EqualEllipsoidal")
#> theta object with d = 3 dimensions and m = 2 components:
#> 
#> $pie
#>      pie1      pie2 
#> 0.3917664 0.6082336 
#> 
#> $mu
#> $mu$comp1
#> [1]  6.111832  9.365707 -3.675417
#> 
#> $mu$comp2
#> [1]  7.403768 12.185331  6.291344
#> 
#> 
#> $sigma
#> $sigma$comp1
#>           [,1]      [,2]       [,3]
#> [1,] 48.116580  43.35063  -5.569435
#> [2,] 43.350634  52.31908 -23.382208
#> [3,] -5.569435 -23.38221  30.814336
#> 
#> $sigma$comp2
#>           [,1]      [,2]       [,3]
#> [1,] 48.116580  43.35063  -5.569435
#> [2,] 43.350634  52.31908 -23.382208
#> [3,] -5.569435 -23.38221  30.814336
#> 
#> 

test <- rtheta()
is.theta(test)
#> [1] TRUE

summary(test)
#> A theta object with d = 2 dimensions and m = 3 components.
print(test)
#> theta object with d = 2 dimensions and m = 3 components:
#> 
#> $pie
#>      pie1      pie2      pie3 
#> 0.2808003 0.1546334 0.5645663 
#> 
#> $mu
#> $mu$comp1
#> [1] -15.44657 -24.38766
#> 
#> $mu$comp2
#> [1] -17.09782  15.75464
#> 
#> $mu$comp3
#> [1] -2.162109 -1.151102
#> 
#> 
#> $sigma
#> $sigma$comp1
#>          [,1]     [,2]
#> [1,] 8.449090 3.995014
#> [2,] 3.995014 6.973794
#> 
#> $sigma$comp2
#>            [,1]       [,2]
#> [1,] 14.1526073  0.9315618
#> [2,]  0.9315618 16.9025868
#> 
#> $sigma$comp3
#>           [,1]      [,2]
#> [1,] 15.089421 -6.048641
#> [2,] -6.048641 16.302684
#> 
#> 
plot(test)

if (FALSE) {
A <- SimulateGMMData(n = 100, rtheta(d = 2, method = "EqualSpherical"))
plot(A$z, col = A$K, pch = A$K, asp = 1)
B <- SimulateGMMData(n = 100, rtheta(d = 2, method = "UnequalSpherical"))
plot(B$z, col = B$K, pch = B$K, asp = 1)
C <- SimulateGMMData(n = 100, rtheta(d = 2, method = "EqualEllipsoidal"))
plot(C$z, col = C$K, pch = C$K, asp = 1)
D <- SimulateGMMData(n = 100, rtheta(d = 2, method = "UnequalEllipsoidal"))
plot(D$z, col = D$K, pch = D$K, asp = 1)}