Kaitlyn
Ernst
170 Summer Program for Undergraduates in Data Science Pitfalls of Anthropomorphism: Misunderstanding AI’s Potential
Abstract profile. Full document pending author claim.
Authors:
Kaitlyn Ernst, Raphael Raux, David Yang
Date Created:
2025-01-01
Course Title:
Professor:
Not specified
About Paper:
Large Language Models (LLMs) often exhibit performance large-latest, and llama3.1-70b). We collected human difficulty patterns deviating from human intuition—tasks trivial for humans ratings from a large incentivized sample and compared these are surprisingly hard for LLMs, and tasks hard for humans with zero-shot LLM accuracy. Despite wide variation in human are easily performed by the models. Yet, users tend to success across questions, LLM performance remained relatively anthropomorphize AI systems, attributing human-like capabilities stable and uncorrelated with human difficulty (OLS ▯ ▯ -0.001), and reasoning patterns to them. Such attribution can result confirming a fundamental divergence between human and AI in systematic overestimation of AI performance on human-easy cognitive patterns. Human reliance on anthropomorphic mental tasks and underestimation on human-hard tasks. Our project models thus not only impairs users’ ability to accurately judge examines how users develop beliefs about AI capabilities and when AI is likely to succeed or fail, but, ultimately, also drives the implications of such projections for adoption behavior. We inefficient patterns of AI adoption and delegation: over-utilizing administered a dataset of 414 standardized math problems from AI on tasks where it underperforms and under-utilizing it where it the TIMSS assessment to both human participants and a range of excels. LLMs (gpt-4o, gpt-3.5-turbo, o3-mini, claude-3.7-sonnet, mistral- Cross-fitting on Dependent Data Hasan Laith, Salvador Balkus, Nima Hejazi Harvard College | Winthrop House | Statistics | 2027 Cross-fitting, after a five decade absence, has resurfaced into the effective sample rate that governs the central-limit term, a a corner-stone of modern semiparametric inference because simple ”as-IID” split—randomly holding out 1/K of the units for it enables machine-learning-powered estimation of causal some K—still suffices. Specializing this result, we derive concrete parameters while retaining the consistency of estimators corollaries for (i) two-way clustered data with bounded rows and and validity of confidence intervals. Existing, or at least columns, (ii) fixed m-dependent time series, and (iii) single- predominating, theory, is however still largely restricted to network settings with bounded degree. Together, these results i.i.d. (independent and identically distributed) observations. We provide a unified justification for using ordinary K-fold cross- extend the cross-fitting framework to dependent data commonly fitting, without specialized fold schemes, across a wide spectrum encountered in practice—clustered data, time series, and network of dependent data regimes. Simulations confirm the predicted samples. Our key contribution is an overarching theorem that root-n convergence and nominal convergence of estimators under characterizes the extent of dependence allowed to a dataset such the aforementioned dependence structures. Our theory bridges the that the asymptotic bias term of a cross-fitting based estimator gap between powerful cross-fitting methods and the realities of remains strictly upper bounded in probability by the inverse squarecorrelated data, broadening the applicability of the general toolbox root of the sample size. The theorem shows that if the number for causal and predictive modeling in economics, epidemiology, of correlated pairs grows strictly slower than the square root of and environmental and social sciences. Accounting for Interference in Causal Inference Using Linear Mixed-Effect Models Eric Tong, Salvador Balkus, Rachel Nethery Harvard College | Leverett House | Statistics | 2028 In causal inference, interference occurs when the outcome of interfere with a given unit. Our article demonstrates how to use one unit can be affected by the treatment or exposure of other mixed-effect models to account for interference in settings with units in the data. Especially in spatial or network data, failure arbitrary network dependence between units. Through numerical to account for interference can yield imprecise or biased effect simulations, wedemonstratethatlinearmodelsusingmixedeffects estimates. In this work, we show mathematically that, under the eliminate interference bias, and 95% confidence intervals achieve assumption of a linear model, interference simplifies to a random better finite-sample coverage than other techniques for variance effect depending on the number of potential neighbors that may adjustment in the interference literature.
Abstract:
Large Language Models (LLMs) often exhibit performance large-latest, and llama3.1-70b). We collected human difficulty patterns deviating from human intuition—tasks trivial for humans ratings from a large incentivized sample and compared these are surprisingly hard for LLMs, and tasks hard for humans with zero-shot LLM accuracy. Despite wide variation in human are easily performed by the models. Yet, users tend to success across questions, LLM performance remained relatively anthropomorphize AI systems, attributing human-like capabilities stable and uncorrelated with human difficulty (OLS ▯ ▯ -0.001), and reasoning patterns to them. Such attribution can result confirming a fundamental divergence between human and AI in systematic overestimation of AI performance on human-easy cognitive patterns. Human reliance on anthropomorphic mental tasks and underestimation on human-hard tasks. Our project models thus not only impairs users’ ability to accurately judge examines how users develop beliefs about AI capabilities and when AI is likely to succeed or fail, but, ultimately, also drives the implications of such projections for adoption behavior. We inefficient patterns of AI adoption and delegation: over-utilizing administered a dataset of 414 standardized math problems from AI on tasks where it underperforms and under-utilizing it where it the TIMSS assessment to both human participants and a range of excels. LLMs (gpt-4o, gpt-3.5-turbo, o3-mini, claude-3.7-sonnet, mistral- Cross-fitting on Dependent Data Hasan Laith, Salvador Balkus, Nima Hejazi Harvard College | Winthrop House | Statistics | 2027 Cross-fitting, after a five decade absence, has resurfaced into the effective sample rate that governs the central-limit term, a a corner-stone of modern semiparametric inference because simple ”as-IID” split—randomly holding out 1/K of the units for it enables machine-learning-powered estimation of causal some K—still suffices. Specializing this result, we derive concrete parameters while retaining the consistency of estimators corollaries for (i) two-way clustered data with bounded rows and and validity of confidence intervals. Existing, or at least columns, (ii) fixed m-dependent time series, and (iii) single- predominating, theory, is however still largely restricted to network settings with bounded degree. Together, these results i.i.d. (independent and identically distributed) observations. We provide a unified justification for using ordinary K-fold cross- extend the cross-fitting framework to dependent data commonly fitting, without specialized fold schemes, across a wide spectrum encountered in practice—clustered data, time series, and network of dependent data regimes. Simulations confirm the predicted samples. Our key contribution is an overarching theorem that root-n convergence and nominal convergence of estimators under characterizes the extent of dependence allowed to a dataset such the aforementioned dependence structures. Our theory bridges the that the asymptotic bias term of a cross-fitting based estimator gap between powerful cross-fitting methods and the realities of remains strictly upper bounded in probability by the inverse squarecorrelated data, broadening the applicability of the general toolbox root of the sample size. The theorem shows that if the number for causal and predictive modeling in economics, epidemiology, of correlated pairs grows strictly slower than the square root of and environmental and social sciences. Accounting for Interference in Causal Inference Using Linear Mixed-Effect Models Eric Tong, Salvador Balkus, Rachel Nethery Harvard College | Leverett House | Statistics | 2028 In causal inference, interference occurs when the outcome of interfere with a given unit. Our article demonstrates how to use one unit can be affected by the treatment or exposure of other mixed-effect models to account for interference in settings with units in the data. Especially in spatial or network data, failure arbitrary network dependence between units. Through numerical to account for interference can yield imprecise or biased effect simulations, wedemonstratethatlinearmodelsusingmixedeffects estimates. In this work, we show mathematically that, under the eliminate interference bias, and 95% confidence intervals achieve assumption of a linear model, interference simplifies to a random better finite-sample coverage than other techniques for variance effect depending on the number of potential neighbors that may adjustment in the interference literature.
Source:
Harvard / Summer Program / 2025
Topics:
human, interference, model, cros, task, unit, pattern, sample, causal, network, large, llms