Yahya
Handulle

Building Knowledge Graphs from Research Data Using Open-Source LLMs

Abstract profile. Full document pending author claim.

Authors:

Yahya Handulle, Stefano Iacus

Date Created:

2025-01-01

Course Title:
Professor:

Not specified

About Paper:

This summer, I’ve been working on building knowledge graphs I’ve built a working local pipeline that loads quantized LLMs and from metadata in the Harvard Dataverse, home to over 185,000 extracts (subject, predicate, object) triples from raw text. I’ve also research datasets. The goal is to use open-source large language begun testing different prompt formats to improve triple quality models(LLMs), likeLLAMA,toautomaticallyextractkeyentities and documenting all experiments along the way. While there are and relationships from dataset descriptions and map them to ongoingchallengeswithmodelaccuracyandformatting,thegoalis structured ontologies (like schema.org or Dublin Core). These to iteratively refine this system and apply it to real metadata from triples can then be linked to formal ontologies to improve dataset the Dataverse, paving the way for more discoverable, connected, discoverability, searchability, and reuse across disciplines. and machine-readable research data. The long-term vision is to make research data more discoverable and interconnected. Calculating Stellar Velocity Dispersion for Gravitational Lenses Using Penalized Pixel-Fitting Method Akmal Hashad, Kim-Vy Tran Harvard College | Quincy House | Astrophysics | 2027 Gravitational lensing is a powerful tool for studying distant most gravitational lenses don’t have measured stellar velocity astronomical objects and total (luminous plus dark) matter dispersions. The goal of this study is to fill the gap by calculating distribution in galaxies and clusters. Strong gravitational lensing stellar velocity dispersions of 5 deflectors using the penalized occurs when a foreground mass distribution, such as a galaxy or pixel-fitting (PPXF) method, which extracts stellar kinematics a galaxy cluster, deflects light coming from a background source, through full spectrum fitting. PPXF calculates velocity dispersion forming multiple magnified images or an arc, such as an Einstein by convolving template spectra (from E-MIELS library in this ring. Gravitational lensing allows astronomers to study distant study) with the line-of-sight velocity distribution (LOSVD) of the galaxies at high redshifts, which reveals information about galaxy observedspectrum. Buildingonthisprocessonasmallsample,this formation and evolution in the early universe and the total mass work will enable expansion to velocity dispersion measurements profile of the deflector (galaxies or galaxy groups), including the on a large scale for all confirmed gravitational lenses, which will mysterious dark matter. Spectroscopy plays an important role be useful for constraining the mass of deflector galaxies in a lens in determining galaxy properties like redshift, mass, and velocity independently, confirming redshifts, and verifying mass obtained dispersion. Independent measurements of velocity dispersions are via lens modelling. used to confirm the robustness of lens mass models. Currently, 100 Program for Research in Science and Engineering RNAi-Mediated Manipulation of Elk Reveals Tissue-Specific Roles in SynapticDevelopmentandPlasticityattheDrosophilaNeuromuscularJunc- tion Mateo Hernández-Hernández, Kumar Aavula, David van Vactor Harvard College | Lowell House | Chemistry | 2028 Synaptic plasticity enables neurons to dynamically remodel their instar larvae are dissected and immunostained with anti-HRP connections, shaping development, learning, and memory. In and anti-DLG to visualize presynaptic terminals and postsynaptic Drosophila melanogaster, the larval neuromuscular junction densities, respectively. Confocal Z-stack epifluorescent imaging (NMJ) offers a genetically tractable system for studying the of muscles VI and VII in abdominal segments A3 and A4 enables structural basis of plasticity, where synaptic boutons serve as thethe quantification and modeling of boutons and ghost boutons. primary sites of neurotransmission. Recent evidence implicates This experimental design permits a systematic assessment of Elk the potassium channel gene Elk as a candidate effector in synaptic function across synaptic development and plasticity. If defects remodeling, particularly under conditions of heightened neuronal arise only after depolarization, Elk may constrain activity-induced activity. In parallel, Elk has been identified as a target of synaptic overgrowth. If they appear at baseline, it may instead microRNAs known to govern synaptic plasticity, suggesting stabilize architecture during development. Phenotypic differences that its expression is regulated by post-transcriptional regulatorybetween neuronal and muscle-specific knockdowns will clarify mechanisms. To investigate the functional role of Elk, we employ whether Elk functions presynaptically, postsynaptically, or across targeted RNA interference (RNAi) and null mutants to assess compartments. This study provides the first targeted analysis NMJ architecture. To secure tissue-specific knockdowns, we use of Elk in synaptic development and activity-induced remodeling the GAL4-UAS system, driven by pan-neuronal (WIII8-GAL4), at the Drosophila NMJ, defining its spatial and functional glutamatergic (OK371-GAL4), or muscle-specific (dmef2-GAL4) contributions and clarifying how activity-responsive potassium promoters. To model activity-dependent synaptic remodeling, channel expression shapes structural synaptic plasticity. we apply an acute depolarization protocol. Furthermore, third

Abstract:

This summer, I’ve been working on building knowledge graphs I’ve built a working local pipeline that loads quantized LLMs and from metadata in the Harvard Dataverse, home to over 185,000 extracts (subject, predicate, object) triples from raw text. I’ve also research datasets. The goal is to use open-source large language begun testing different prompt formats to improve triple quality models(LLMs), likeLLAMA,toautomaticallyextractkeyentities and documenting all experiments along the way. While there are and relationships from dataset descriptions and map them to ongoingchallengeswithmodelaccuracyandformatting,thegoalis structured ontologies (like schema.org or Dublin Core). These to iteratively refine this system and apply it to real metadata from triples can then be linked to formal ontologies to improve dataset the Dataverse, paving the way for more discoverable, connected, discoverability, searchability, and reuse across disciplines. and machine-readable research data. The long-term vision is to make research data more discoverable and interconnected. Calculating Stellar Velocity Dispersion for Gravitational Lenses Using Penalized Pixel-Fitting Method Akmal Hashad, Kim-Vy Tran Harvard College | Quincy House | Astrophysics | 2027 Gravitational lensing is a powerful tool for studying distant most gravitational lenses don’t have measured stellar velocity astronomical objects and total (luminous plus dark) matter dispersions. The goal of this study is to fill the gap by calculating distribution in galaxies and clusters. Strong gravitational lensing stellar velocity dispersions of 5 deflectors using the penalized occurs when a foreground mass distribution, such as a galaxy or pixel-fitting (PPXF) method, which extracts stellar kinematics a galaxy cluster, deflects light coming from a background source, through full spectrum fitting. PPXF calculates velocity dispersion forming multiple magnified images or an arc, such as an Einstein by convolving template spectra (from E-MIELS library in this ring. Gravitational lensing allows astronomers to study distant study) with the line-of-sight velocity distribution (LOSVD) of the galaxies at high redshifts, which reveals information about galaxy observedspectrum. Buildingonthisprocessonasmallsample,this formation and evolution in the early universe and the total mass work will enable expansion to velocity dispersion measurements profile of the deflector (galaxies or galaxy groups), including the on a large scale for all confirmed gravitational lenses, which will mysterious dark matter. Spectroscopy plays an important role be useful for constraining the mass of deflector galaxies in a lens in determining galaxy properties like redshift, mass, and velocity independently, confirming redshifts, and verifying mass obtained dispersion. Independent measurements of velocity dispersions are via lens modelling. used to confirm the robustness of lens mass models. Currently, 100 Program for Research in Science and Engineering RNAi-Mediated Manipulation of Elk Reveals Tissue-Specific Roles in SynapticDevelopmentandPlasticityattheDrosophilaNeuromuscularJunc- tion Mateo Hernández-Hernández, Kumar Aavula, David van Vactor Harvard College | Lowell House | Chemistry | 2028 Synaptic plasticity enables neurons to dynamically remodel their instar larvae are dissected and immunostained with anti-HRP connections, shaping development, learning, and memory. In and anti-DLG to visualize presynaptic terminals and postsynaptic Drosophila melanogaster, the larval neuromuscular junction densities, respectively. Confocal Z-stack epifluorescent imaging (NMJ) offers a genetically tractable system for studying the of muscles VI and VII in abdominal segments A3 and A4 enables structural basis of plasticity, where synaptic boutons serve as thethe quantification and modeling of boutons and ghost boutons. primary sites of neurotransmission. Recent evidence implicates This experimental design permits a systematic assessment of Elk the potassium channel gene Elk as a candidate effector in synaptic function across synaptic development and plasticity. If defects remodeling, particularly under conditions of heightened neuronal arise only after depolarization, Elk may constrain activity-induced activity. In parallel, Elk has been identified as a target of synaptic overgrowth. If they appear at baseline, it may instead microRNAs known to govern synaptic plasticity, suggesting stabilize architecture during development. Phenotypic differences that its expression is regulated by post-transcriptional regulatorybetween neuronal and muscle-specific knockdowns will clarify mechanisms. To investigate the functional role of Elk, we employ whether Elk functions presynaptically, postsynaptically, or across targeted RNA interference (RNAi) and null mutants to assess compartments. This study provides the first targeted analysis NMJ architecture. To secure tissue-specific knockdowns, we use of Elk in synaptic development and activity-induced remodeling the GAL4-UAS system, driven by pan-neuronal (WIII8-GAL4), at the Drosophila NMJ, defining its spatial and functional glutamatergic (OK371-GAL4), or muscle-specific (dmef2-GAL4) contributions and clarifying how activity-responsive potassium promoters. To model activity-dependent synaptic remodeling, channel expression shapes structural synaptic plasticity. we apply an acute depolarization protocol. Furthermore, third

Source:

Harvard / Harvard College | Currier House | Integrative Biology | 2027 / 2025

Topics:

galaxy, synaptic, velocity, elk, dispersion, gravitational, mass, plasticity, activity, stellar, development, using

Professor Score
92.5
Verified
Fabrisia Ambrosio
0
Jerome Jarjoura
0
Tyler McNeill
0
Patrick Slade
0
Hassan Farah
0
Jae-Ryeong Choi
0
Katherine Opria
0
Ruaidhri Jackson
0