|
Sparse Partitioning
"Sparse Partitioning: A method to detect interactions in high dimensional regression problems with binary or tertiary predictors". The pages on this website provide the source files for the method, accompanied by user instructions and two simple examples. In addition, the code used to generate the simulated datasets featured in the methodology paper is provided along with some of the simulated datasets. Brief Outline of Sparse Partitioning Consider a regression problem with n individuals and N predictors. Let the matrix Y (size n x 1) contains the responses and the matrix X (size n x N) contains the predictors. For the ith individual, Yi denotes its response while X1, X2, ..., XN denote its predictor values. The aim is to identify which predictors influence the response. We suppose the response and the predictors are related by E(Y) = l-1 f(X). where l is a link function (when the responses are continuous, this is the identity function; when the responses are binary, this is the logit function). We wish to identify details of the function f. In particular, we are interested in which predictors have an effect on the value f(X). Sparse Partitioning considers the partitioning of the predictors defined by the function f. For example, consider a five predictor problem, with underlying relationship f(X) = X1 x X2 + X3. The predictor set {1, 2, 3, 4, 5} can be partitioned as {G0, G1, G2} = {{4,5}, {1, 2}, {3}}, where G0 contains those predictors which make no contribution to f, while G1 and G2 contain groups of predictors that contribute. Formally, the partitioning will take the form {1, 2, ..., N} = {G0, G1, G2, ..., GK}. G0 contains predictors not associated with the response, while G1, G2, ..., GK are groups of associated predictors. Interactions are permitted between predictors within the same group (in the example above, predictors 1 and 2 interact) but not between predictors across different groups. Therefore the underlying relationship can be written as f(X) = f1(XG1) + f2(XG2) + ... + fK(XGK). Sparse Partitioning searches for partitionings which have high posterior probabilities in light of the data. It is suggested first time visitors visit the links below from left to right.
|