TY - GEN
T1 - The Problem of Semantic Shift in Longitudinal Monitoring of Social Media
T2 - 14th ACM Web Science Conference, WebSci 2022
AU - Harrigian, Keith
AU - Dredze, Mark
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/26
Y1 - 2022/6/26
N2 - Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.
AB - Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.
KW - longitudinal monitoring
KW - mental health
KW - semantic shift
UR - http://www.scopus.com/inward/record.url?scp=85133714915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133714915&partnerID=8YFLogxK
U2 - 10.1145/3501247.3531566
DO - 10.1145/3501247.3531566
M3 - Conference contribution
AN - SCOPUS:85133714915
T3 - ACM International Conference Proceeding Series
SP - 208
EP - 218
BT - WebSci 2022 - Proceedings of the 14th ACM Web Science Conference
PB - Association for Computing Machinery
Y2 - 26 June 2022 through 29 June 2022
ER -