Regression Expression Variation Analysis (REVA): A rank-based multi-dimensional measure of correlation

Bahman Afsari, Alexander V. Favorov, Elana J. Fertig, Leslie Cope

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sometimes a simple question arises: how does the distance between two samples in multivariate space compare to another scalar value associated with each sample. Here, inspired by the Kendall rank correlation coefficient, we propose theory for a non-parametric test to statistically test this association based on the neighbors principle implicit in any machine learning algorithm which says that samples with similar labels should be close to one another in feature space as well. Our test, REVA, is independent of the scale of the scalar data, and thus generalizable to any comparison of samples with both high-dimensional data and a scalar. We use U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. To establish the admissibility of the REVA statistic, and explore the utility and limitations of our model, we compared it to the most widely used distance based correlation coefficient in a range of simulated conditions, demonstrating that REVA does not depend on an assumption of linearity, and is robust to high levels of noise, high dimensions, and the presence of outliers. We apply the resulting statistic to problems in cancer biology motivated by the model that cancer cells with more similar gene expression profiles to one another can be expected to have a more similar response to therapy.

Original languageEnglish (US)
Title of host publicationProceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
EditorsM. Arif Wani, Mehmed Kantardzic, Vasile Palade, Daniel Neagu, Longzhi Yang, Kit-Yan Chan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1063-1070
Number of pages8
ISBN (Electronic)9781665462839
DOIs
StatePublished - 2022
Event21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022 - Nassau, Bahamas
Duration: Dec 12 2022Dec 14 2022

Publication series

NameProceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022

Conference

Conference21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
Country/TerritoryBahamas
CityNassau
Period12/12/2212/14/22

Keywords

  • Distance
  • Kernel-Methods
  • Multidimensional Correlation
  • Rank

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Artificial Intelligence
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Regression Expression Variation Analysis (REVA): A rank-based multi-dimensional measure of correlation'. Together they form a unique fingerprint.

Cite this