Gavin
Ye

Automated Chemical Reasoning Agent for Drug Design

Abstract profile. Full document pending author claim.

Authors:

Gavin Ye, Nada Amin

Date Created:

2025-01-01

Course Title:
Professor:

Not specified

About Paper:

Recent developments in frontier AI language models have such as their 3D structure, can be provided through external graph observedthesuccessoflanguagemodelagentsinvariousdomains, and stereochemistry encoders. With the addition of downstream such as code generation. While a few recent studies did molecularvalidationtoolssuchasdockingsimulationsandexternal demonstrate the possibility of automated research with agentic evaluation models, Ethereal can be applied to downstream drug LLMs, their research scope is mainly limited to computer science designing tasks against a specific drug target, after several rounds and math. Here, we introduce Ethereal: an agentic reasoning of supervised and reinforcement learning training. As the first model for automated molecular and drug designing tasks. By chemicalreasoningmodelfordrugdesign,Etherealmarksacrucial treating molecules as an additional modality, important chemical step towards fully automated drug development and downstream information that was previously inaccessible to traditional LLMs, natural science discovery. Evaluating and Enhancing Large Language Models (LLMs) for Fair and Accurate Medical Image Diagnosis Todd Zhou, Mengyu Wang Harvard College | Winthrop House | Computer Science | 2027 Large language models (LLMs) have begun to offer diagnostic the receiver operating characteristic curve (AUC), sensitivity, opinionsonmedicalimagestothepublicwithoutclinicaloversight, specificity, precision, and recall, are computed overall and within leading to disastrous ramifications. This project evaluates race, age, and gender strata. Equity-scaled AUC quantifies whether these general tools can match the accuracy and equity disparities. Fair LoRA fine-tunes each LLM by sharing low-rank of specialized medical artificial intelligence and proposes a matrices while learning subgroup-specific scaling factors, thereby parameter-efficient strategy, low-rank adaptation (LoRA), to close preserving parameter efficiency yet permitting tailored calibration. any gap. The primary objective is to benchmark ChatGPT, Preliminary experiments show that specialized networks Llama,andGrokagainstestablishedradiological,pathological,and outperform unadapted LLMs by five to eight percentage points retinal classifiers and tointroduce Fair LoRA, an equity-promoting in AUC, and error rates differ by up to 30% between demographic adaptation that aligns performance across demographic groups. extremes. After Fair LoRA training, the best LLM closes Three open-source datasets support a comprehensive evaluation: the average AUC gap to one point and reduces the maximum a Chest X-Ray repository, a public pathology slide archive, false negative disparity from 25% to 9%. These improvements and a retinal photograph collection. Each dataset is partitioned suggestthatparameter-sparse, fairness-orientedtuningcanachieve into stratified training, validation, and test sets that preserve competitive accuracy while materially narrowing equity gaps. balanced subgroup counts. Baseline performance is recorded for Future work may extend the assessment to additional imaging dedicated models such as CheXNet. The same images are then modalities and prospectively validate Fair LoRA in clinical triage converted to visual embeddings and passed, with concise textual prompts, to each LLM. Standard metrics, including area under simulations. Program for Research in Markets and Organizations

Abstract:

Recent developments in frontier AI language models have such as their 3D structure, can be provided through external graph observedthesuccessoflanguagemodelagentsinvariousdomains, and stereochemistry encoders. With the addition of downstream such as code generation. While a few recent studies did molecularvalidationtoolssuchasdockingsimulationsandexternal demonstrate the possibility of automated research with agentic evaluation models, Ethereal can be applied to downstream drug LLMs, their research scope is mainly limited to computer science designing tasks against a specific drug target, after several rounds and math. Here, we introduce Ethereal: an agentic reasoning of supervised and reinforcement learning training. As the first model for automated molecular and drug designing tasks. By chemicalreasoningmodelfordrugdesign,Etherealmarksacrucial treating molecules as an additional modality, important chemical step towards fully automated drug development and downstream information that was previously inaccessible to traditional LLMs, natural science discovery. Evaluating and Enhancing Large Language Models (LLMs) for Fair and Accurate Medical Image Diagnosis Todd Zhou, Mengyu Wang Harvard College | Winthrop House | Computer Science | 2027 Large language models (LLMs) have begun to offer diagnostic the receiver operating characteristic curve (AUC), sensitivity, opinionsonmedicalimagestothepublicwithoutclinicaloversight, specificity, precision, and recall, are computed overall and within leading to disastrous ramifications. This project evaluates race, age, and gender strata. Equity-scaled AUC quantifies whether these general tools can match the accuracy and equity disparities. Fair LoRA fine-tunes each LLM by sharing low-rank of specialized medical artificial intelligence and proposes a matrices while learning subgroup-specific scaling factors, thereby parameter-efficient strategy, low-rank adaptation (LoRA), to close preserving parameter efficiency yet permitting tailored calibration. any gap. The primary objective is to benchmark ChatGPT, Preliminary experiments show that specialized networks Llama,andGrokagainstestablishedradiological,pathological,and outperform unadapted LLMs by five to eight percentage points retinal classifiers and tointroduce Fair LoRA, an equity-promoting in AUC, and error rates differ by up to 30% between demographic adaptation that aligns performance across demographic groups. extremes. After Fair LoRA training, the best LLM closes Three open-source datasets support a comprehensive evaluation: the average AUC gap to one point and reduces the maximum a Chest X-Ray repository, a public pathology slide archive, false negative disparity from 25% to 9%. These improvements and a retinal photograph collection. Each dataset is partitioned suggestthatparameter-sparse, fairness-orientedtuningcanachieve into stratified training, validation, and test sets that preserve competitive accuracy while materially narrowing equity gaps. balanced subgroup counts. Baseline performance is recorded for Future work may extend the assessment to additional imaging dedicated models such as CheXNet. The same images are then modalities and prospectively validate Fair LoRA in clinical triage converted to visual embeddings and passed, with concise textual prompts, to each LLM. Standard metrics, including area under simulations. Program for Research in Markets and Organizations

Source:

Harvard / Harvard College | Mather House | Computer Science | 2028 / 2025

Topics:

model, drug, llms, science, fair, lora, automated, computer, auc, equity, language, downstream

Co-authors:

@gavinye208 , @nadaamin209

Professor Score
92.5
Verified
Sara Buhrlage
0
Patrick Slade
0
Phoebe Rubio
0
Michael Darnowski
0
Hassan Farah
0
Jae-Ryeong Choi
0
Catherine Dulac
0
Harris Kaplan
0