CV

Summary

My name is Siddhartha Reddy Jonnalagadda, Sid J Reddy, in short. My primary interests are LLMs, Conversational AI, NLP, and Deep Learning. I developed NLP, conversational AI and LLM products at technology startups, research labs and big tech. I am currently one of the leads for Gemini at Google. My research is featured in over 110+ publications and submitted patents in AI (machine learning, deep learning, information retrieval, reinforcement learning, dialog systems, information extraction, summarization, and question answering).


Professional Experience

Google             Mountain View, CA

Senior Staff Scientist/Manager (2022 - )           


Amazon             Seattle, WA

Principal Scientist (2020 - 2022)           


Conversica             Seattle, WA

Chief Scientist and SVP (3+ years)           


Microsoft         Bellevue, WA

Principal Applied Scientist (1+ years)           


Northwestern University           Chicago, IL 

Assistant Professor (3 years) Adjunct Professor (2 years)           


Mayo Clinic       Rochester, MN 

Principal Investigator (2 years)   


University of California at Berkeley Remote

Guest Faculty (3 years)           


Lnx Research             Orange, CA

Lead Researcher (3 years)   


Jade Falcon IT           Glendale, AZ

Lead Researcher (1 year)   

Microsoft Research           Bangalore, India

Research Collaborator (1 year)   


Education

Arizona State University         Phoenix, AZ

Doctor of Philosophy, Biomedical Informatics (with emphasis in biomedical natural language processing)

Dissertation: An Effective Approach to Biomedical Information Extraction with Limited Training Data


Indian Institute of Technology   Kharagpur, India 

Bachelor of Technology with Honours, Computer Science and Engineering 

CGPA: 9.43/10.00


Skills


Honors and Awards


(Appendix follows)


Publications 


Selected Presentations 

Invited talks



Service

Research Support


Using speech and language to identify patients at risk for hospitalizations and emergency department visits in homecare

Columbia Center of Artificial Intelligence Technology, in collaboration with Amazon

This study is the first step in exploring an emerging and previously understudied data stream - verbal communication between healthcare providers and patients. Specifically, we a) record patient-nurse communications and extract information providing clues for patient risk identification, using conventional feature extraction or end-to-end machine learning models; b) extract the terms and expressions indicating clinical risk factors, lifestyle risk factors and clinical interventions associated with risk of hospitalization and emergency department visits; c) develop machine learning models which combine information extracted from audio-recorded communications and from patient medical records to estimate the risk of emergency department visit and hospitalization. 

Role: Co-Investigator


Deep Reasoning for Dialog (Natural Understanding Spark Initiative)

Amazon

After reviewing 139 org-wide proposals, four proposals have been selected for the inaugural cycle of NU Spark. In this work we propose to build on recent advances in research on logical reasoning and deep networks to bring reasoning capabilities to our dialog systems. With the vision of “enabling world-class AI-driven experiences” and “inventing the future of conversational AI”, we want to build conversational systems that can take information that is known, unite it with existing knowledge, make inferences about information that is unknown or uncertain in order to address a customer’s need.

Role: Co-Investigator


Meeting Clinician’s Information Needs with Highly Tailored Knowledge Summaries (4 years) 

NIH R01

The goal of this research is to extract knowledge summaries automatically to assist physicians in making the best patient-care decisions. Specifically, this involves information retrieval, information extraction and multi-document summarization, in which I have significant expertise.

Role: Consortium PI


Utilizing Electronic Health Records to Measure and Improve Prostate Cancer Care (2 years)

NIH R01

We propose to assemble a robust data-mining workflow to efficiently and accurately capture treatment and outcome quality metrics from structured data and free-text in EHRs. We will put this evidence in the hands of both clinicians and patients through a web-based risk assessment tool.

Role: Consortium PI



Genomic Medicine at Northwestern: Discovery and Implementation (4 years)

NIH U01

The Electronic Medical Records and Genomics (eMERGE) Network is a National Institutes of Health (NIH)-organized and funded consortium of U.S. medical research institutions. The Network brings together researchers with a wide range of expertise in genomics, statistics, ethics, informatics, and clinical medicine from leading medical research institutions across the country to conduct research in genomics, including discovery, clinical implementation and public resource.

Role: Collaborator 


Mining Unstructured Cardiovascular Data in Electronic Medical Records (2 years)

Bluhm Cardiovascular Institute

In this project, we are building NLP algorithms to better identify patients for clinical trial enrolment and improve quality of care. We will be extracting cardiovascular information from narratives such as attending notes and discharge summaries.

Role: PI


Improving the Efficiency and Efficacy in Authoring Essential Clinical FAQs (4 years)

NIH R00

It has been observed that point of care access to relevant clinical knowledge supports decision-making, decreases medical errors, improves patient safety and reduces healthcare costs. This project aims to empower physicians specialized in the area (specialists) in quickly gathering evidence from literature or finding citations supporting or qualifying their expert opinion. It will also generate the answers and suggest updates to the existing answers for their perusal.

Role: PI


Natural Language Processing to Enhance Screening for PARAGON Clinical Trial (1 year)

Novartis Pharmaceuticals Corporation

This research project aims focus on the development, programming, testing, and validation of a natural language processing (NLP)-based software that may result in a three-fold increase in the number of enrolled patients at sites where it is used. We are studying whether using an NLP-based algorithm allows clinicians across multiple sites to more efficiently identify patients matching the set of inclusion-exclusion criteria for the Novartis PARAGON-HF clinical trial.

Role: Co-PI


Machine Learning Algorithms to Reduce Errors in Clinical Workflows (1 year)

Baxter Healthcare


Molecular Determinants of Hypertensive HFpEF: Genomics, Transcriptomics, and Proteomics ( 1 year)

American Heart Association

Role: Co-Investigator


Pharmacovigilance Pilot (1 year)

Abbvie

Role: Co-Investigator


NLP for AskMayoExpert (Mayo Clinic’s Clinical Knowledge System) (2 years)

Mayo Clinic 

Built a system for automatically answering clinical questions by analysing the question, extracting information from trusted sources, and summarizing the text answer. 

Role: PI


Pilot work for use of NLP in Clinical Decision Support (1 year)

University of Utah

Using Continuity of Care Documents with text semantics (such as the predication database from Semantic Medline), we created search strategies to retrieve and summarize content from a set of popular knowledge resources. This work focused on patients with multiple chronic conditions.

Role: PI


Natural language processing for biological knowledge management (2 years)

National Science Foundation

Role: NLP Collaborator


Text mining clinical notes: porting semi-supervised techniques from biomedical literature mining (2 years)

NIH Contract

The goal of this research was to investigate novel information extraction approaches using distributional semantics and sentence simplification. Specifically, we extracted mentions of problems and treatments and relations between them as found in clinical narratives.

Role: Consortium PI