TY - GEN
T1 - Regression Expression Variation Analysis (REVA)
T2 - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
AU - Afsari, Bahman
AU - Favorov, Alexander V.
AU - Fertig, Elana J.
AU - Cope, Leslie
N1 - Funding Information:
This work is dedicated to the memory of late Mahmood Afsari and his uncompromising passion for knowledge and science. Also, this work was supported by the National Institutes of Health (grants U01CA212007 and U01CA253403) and Russian Academic Fundamental Research Project (0092-2022-0001). * To whom correspondence should be addressed.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Sometimes a simple question arises: how does the distance between two samples in multivariate space compare to another scalar value associated with each sample. Here, inspired by the Kendall rank correlation coefficient, we propose theory for a non-parametric test to statistically test this association based on the neighbors principle implicit in any machine learning algorithm which says that samples with similar labels should be close to one another in feature space as well. Our test, REVA, is independent of the scale of the scalar data, and thus generalizable to any comparison of samples with both high-dimensional data and a scalar. We use U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. To establish the admissibility of the REVA statistic, and explore the utility and limitations of our model, we compared it to the most widely used distance based correlation coefficient in a range of simulated conditions, demonstrating that REVA does not depend on an assumption of linearity, and is robust to high levels of noise, high dimensions, and the presence of outliers. We apply the resulting statistic to problems in cancer biology motivated by the model that cancer cells with more similar gene expression profiles to one another can be expected to have a more similar response to therapy.
AB - Sometimes a simple question arises: how does the distance between two samples in multivariate space compare to another scalar value associated with each sample. Here, inspired by the Kendall rank correlation coefficient, we propose theory for a non-parametric test to statistically test this association based on the neighbors principle implicit in any machine learning algorithm which says that samples with similar labels should be close to one another in feature space as well. Our test, REVA, is independent of the scale of the scalar data, and thus generalizable to any comparison of samples with both high-dimensional data and a scalar. We use U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. To establish the admissibility of the REVA statistic, and explore the utility and limitations of our model, we compared it to the most widely used distance based correlation coefficient in a range of simulated conditions, demonstrating that REVA does not depend on an assumption of linearity, and is robust to high levels of noise, high dimensions, and the presence of outliers. We apply the resulting statistic to problems in cancer biology motivated by the model that cancer cells with more similar gene expression profiles to one another can be expected to have a more similar response to therapy.
KW - Distance
KW - Kernel-Methods
KW - Multidimensional Correlation
KW - Rank
UR - http://www.scopus.com/inward/record.url?scp=85152213820&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152213820&partnerID=8YFLogxK
U2 - 10.1109/ICMLA55696.2022.00176
DO - 10.1109/ICMLA55696.2022.00176
M3 - Conference contribution
AN - SCOPUS:85152213820
T3 - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
SP - 1063
EP - 1070
BT - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
A2 - Wani, M. Arif
A2 - Kantardzic, Mehmed
A2 - Palade, Vasile
A2 - Neagu, Daniel
A2 - Yang, Longzhi
A2 - Chan, Kit-Yan
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 December 2022 through 14 December 2022
ER -