CV

Summary

My name is Siddhartha Reddy Jonnalagadda, Sid J Reddy, in short. My primary interests are LLMs, Conversational AI, NLP, and Deep Learning. I developed NLP, conversational AI and LLM products at technology startups, research labs and big tech. I am currently one of the leads for Gemini at Google. My research is featured in over 110+ publications and submitted patents in AI (machine learning, deep learning, information retrieval, reinforcement learning, dialog systems, information extraction, summarization, and question answering).

Professional Experience

Google Mountain View, CA

Senior Staff Scientist/Manager (2022 - )

Contributing to the safety of Core Gemini model through multiple innovative and leapfrog approaches (e.g., instruction following, safety heads, reinforcement learning)
Lead the team that reduced overblocking of Gemini safety filters by over 50%
Worked on next generation LLM applications for scientists, researchers, lawyers and knowledge workers in general
Powered the retrieval of NotebookLM and Gemini API's Semantic Retriever with SOTA embeddings model
Lead PaLM Embeddings Model that powers Vertex AI and PaLM API (and being considered by several 1p and 3p customers)
Collaborated with AstroHelper and Gmail for integrating LLM into email and scheduling.
Collaborating with OCTO on talk to manuals for Ford and GM
Built the first voice order-taking demo that was applauded by Wendy’s executives, featured in WSJ and resulted in a $XXXM deal.

Amazon Seattle, WA

Principal Scientist (2020 - 2022)

Lead the science and technology design for Natural Turn-Taking (Conversation mode) and authored two submitted patents.
Mentored 60+ scientists and engineers working on Alexa Conversations.
Advised Stanford University, Universidad Politécnica de Madrid, etc. for Alexa Prize SGC4.
Initiated funded collaborations with faculty from Columbia, UC Davis, UT Austin, Emory, etc.
First author of a three-year innovation roadmap presented by Rohit Prasad across Alexa and by me across Alexa AI Product.
Designed and shipped a meta conversational assistant that generates Alexa skills conversationally.
Coauthored multiple PRFAQs and contributed to OP1s of Conversational Understanding org.

Conversica Seattle, WA

Chief Scientist and SVP (3+ years)

Lead Conversica's AI efforts towards becoming a leader in Conversational AI for Intelligent Virtual Assistants.
Built state-of-the-art cognitive, empathetic, transactional, and advisory capabilities that are used by 1500+ live, activated, diverse subscriber companies such as CenturyLink, Sacramento Kings, Mercedes Benz of Plano, IBM, and Assure Funding.
Streamlined the platform's human intelligence (HitL) to handle exceptions within an SLA of 10 seconds; designed pipelines to extend responses (XR) using self-learning components; oversaw the design of annotation microservices.
Architected deep learning networks with pretrained transformer models finetuned on over 30 million responses and enriched with linguistic features for our natural language understanding (NLU), designed our inference engine that makes policy decisions, and experimented augmenting our Natural Language Generation framework with reinforcement learning algorithms.
Partnered with sales, channels, marketing, customer success, product, CEO, and board.

Microsoft Bellevue, WA

Principal Applied Scientist (1+ years)

Mentored over 15 junior developers, scientists, and interns in Bellevue, Redmond, and China, and helped them operationalize breakthrough algorithms for conversation understanding.
Headed the collaboration with multiple product teams and research teams across Cortana, Office 365, LUIS, and external products to advance conversational AI capabilities for greater automation and user satisfaction.
Delivered results central for winning key presentations and demos to top-level leadership team including Bill Gates, Harry Shum, and David Ku.
Authored the material to boot-start software engineers and domain experts with customizing algorithms and ontologies on the top of our group’s conversational understanding platform.
Lead the design and development of AI efforts to improve precision, recall, and automation rate for intent-level and entity-level conversation understanding. Implemented semi-automatic processes to maintain ontologies for task completion.

Northwestern University Chicago, IL

Assistant Professor (3 years) Adjunct Professor (2 years)

Headed a group of over ten students and postdocs (over 25 overall affiliate members) with funding from over 15 grants, contracts, and projects from University, Hospitals, NIH and Industry (ex: Novartis, Baxter, AbbVie, Mayo Clinic, Stanford).
Designed automated approaches to find relevant citations for clinical guideline development.
Demonstrated the efficacy of multi-pass sieve methods and machine learning approaches for extracting data elements and sentences relevant to evidence synthesis from biomedical literature in the form of PDF documents.
Performed a pilot for more comprehensive knowledge extraction by extracting the characteristics of the patient population through a novel two-stage extraction framework.
Assessed the feasibility of automatically generating knowledge summaries for a given clinical topic by composing relevant information extracted from Medline citations.
Achieved the second position in TREC Clinical Decision Support track.
Developed a platform to analyze website trustworthiness to assist healthcare providers, informaticians, and online health information entrepreneurs and developers in helping patients and caregivers make informed choices.
Implemented algorithms to automatically answer health-related questions based on past questions and answers (QA).
Developed a system that extracts information specific to inclusion and exclusion criteria from multiple kinds of clinical notes and fields to identify patients for clinical trial enrolment (used by Novartis and Northwestern Medicine).
Contributed NLP and machine learning to a retrospective clinical study about the proportion of patients with non-ischemic cardiomyopathy who experience myocardial recovery after initial HHF and examined outcomes associated with recovery status.
Developed Echo Infer for extracting information from unstructured echocardiography reports.
Reviewed grant applications for Government, program chair member for 15+ AI conferences, invited reviewer for 30+ AI journals and conferences

Mayo Clinic Rochester, MN

Principal Investigator (2 years)

Supervised and funded the research of five junior PhDs and interns as the principal investigator of three grants.
Devised a system that only uses the input and feedback of human reviewers during the course of evidence synthesis from biomedical literature using distributional semantics and relevance feedback.
Developed a machine learning model to automatically rate journals for a given clinical topic to enable filtering journals or ranking the articles based on source journal and demonstrated that bibliometrics such as impact factor, h-index, and number of articles per year provide better results when used in combination with topic-specific metrics such as number of abstracts indexed with the corresponding MeSH terms.
Augmented Mayo Clinic’s AskMayoExpert system to retrieve related sentences from PubMed abstracts as evidence candidates using machine learning approaches to transform expert-based content to evidence-based content.
Implemented a multi-pass sieve algorithm combined with vicinity filters and factorial HMM-based pronoun resolution sieve to resolve coreferences in clinical narratives.
Operationalized cTAKES in Mayo Clinic’s Enterprise Datawarehouse and built the foundation for several clinical informatics projects at Mayo Clinic such as temporal information detection from clinical text, analysis of cross-institutional medical information, and automated chart reviews.

University of California at Berkeley Remote

Guest Faculty (3 years)

Taught "Natural Language Processing with Deep Learning" course as part of the Masters in Data Science.

Lnx Research Orange, CA

Lead Researcher (3 years)

Designed patented algorithms to extract organization from PubMed abstracts at close to 100% accuracy. Also normalized organization names through clustering based on local sequence alignment metrics and local learning based on finding connected components.
Used network analysis metrics to find key centres of excellence.
Combined named entity recognition from unstructured news articles with social network analysis to discover opinion leaders for a given medical topic.

Jade Falcon IT Glendale, AZ

Lead Researcher (1 year)

Created WitnessTree that provides text analysis for e-discovery. WitnessTree clusters documents and concepts using corpus semantic approaches and allows more intuitive search through clusters organized automatically into hierarchies and concepts grouped with contextually similar words. Also developed a sentiment-analysis component using machine learning based document classification.

Microsoft Research Bangalore, India

Research Collaborator (1 year)

Designed a generic firewall analyzer for the network administrators of IIT Kharagpur to solve misconfigurations in the complex firewalls by studying different firewall rules, reducing them to Boolean format, providing a framework for high-level security queries, and returning a high-level answer.
Detailed analysis of the five-layered Microsoft OS firewall where each layer was designed by a different product team of Microsoft. Collected the data, integrated it, used SAT Solver and advanced analysis tools to identify associations.

Education

Arizona State University Phoenix, AZ

Doctor of Philosophy, Biomedical Informatics (with emphasis in biomedical natural language processing)

Dissertation: An Effective Approach to Biomedical Information Extraction with Limited Training Data

Applied a permutation-based variant of the random indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language.
Introduced novel natural language processing approaches to transcend limitations in relation to automatically extracting concepts and relationships from biomedical text using distributional statistics and sentence simplification.
Implemented a competitive system that was placed in the top five among 41 international teams for a clinical information extraction challenge.
Discovered the usefulness of automatically generated resources based on distributional semantics for named entity recognition. Features proposed included n-nearest words [quasi-thesaurus of distributionally similar words using nearest neighbors], support vector machine (SVM)-regions [quasi-lexicons of concept classes using SVM], and term clustering [clusters of distributionally similar words over K-means].
Proposed a “shotgun” approach for information extraction that uses grammatical information in elemental chunks that can then be combined and recombined to generate many sentences from one (different perspectives) in order to maximize the likelihood that an automatic extraction engine can find in one (or several) of them the information contained in the original sentence.

Indian Institute of Technology Kharagpur, India

Bachelor of Technology with Honours, Computer Science and Engineering

CGPA: 9.43/10.00

Investigated the use of similarity measures of time-series data extracted from sports news to predict outcomes of live sports events.

Skills

Visionary leadership, product development, team building, organizational restructuring, key partnership development, public speaking, creative writing, internal fundraising & external VC due diligence
Advanced algorithms for Large Language Models, Conversational AI, Natural Language Understanding, Natural Language Generation, Named Entity Recognition, Normalization, Event Extraction, Sentence Compression, Deep Learning, Machine Learning, Clustering, Reinforcement Learning, Distributional Semantics, and Information Retrieval
Building Object-Oriented Systems; applying Software Engineering Principles

Honors and Awards

Google Cloud Tech Impact Awards (CTIA) winner for Embeddings API
Google Cloud's 2H'23 Customer Empathy Award for Talk to Menus project
Amazon Alexa AI NU Spark Award for most customer-obsessed science
Selected for Top 5 Most Promising AI Technologies, VentureBeat Summit
Invited Keynote at AI with the Best
Product of the Year Award for Sales and Marketing Technology by Business Intelligence Group (Conversica)
Digiday Technology Award for Best Sales Automation Platform
CB Insights AI 100 Award (Conversica)
Rele Award Winner for Data Science, AI & Intelligence Platform (Conversica)
Finalist at AIconics Awards - “Best Innovation in Intelligent Automation” by Alconics (Conversica)
Best Application of AI for Sales and Marketing by Alconics (Conversica)
AI Excellence Award From Business Intelligence Group (Conversica)
Awards.ai for Best Use of AI for Natural Language Processing (NLP) or Natural Language Generation (NLG) (Conversica)
Digital Innovation Award for Sales by Ventana Research (Conversica)
Top 7 percentile on an NIH R01 titled “A Generalized Full-text Information Extraction Framework for Efficient Systematic Review Management”
Outstanding Reviewer, Journal of Biomedical Informatics
Primary author of the Editor’s choice paper in the JAMIA special issue on Natural Language Processing
Best paper award, IEEE Healthcare Informatics and Systems Biology conference (HISB)
National Institute of Health’s Pathway to Independence Award
ICIBM Travel Award for Best PhD/Postdoctoral Papers
Mayo Clinic Quality Academy Bronze Fellow
Postdoctoral fellowship offer, National Library of Medicine
Finalist, Yahoo! Key Scientific Challenges Program
Co-investigator, NLM 1 year contract
Honorary Poster Award, BMI Annual Symposium, Phoenix
NAACL Travel Award for Best Student Papers
5th Rank in University, Bachelors in Technology, Indian Institute of Technology, Kharagpur
Recipient, Inlaks Awards of Excellence at IITs (among 6 others in India)
Finalist, Lucent Global Scholars Program (among 11 others in India)
Gold medal, Indian National Physics Olympiad (among 24 others)
10th Rank, All India Engineering Entrance Examination
Recipient, NTSE scholarship
Silver medal, Regional Mathematical Olympiad (Andhra Pradesh, India)
Top two in AS Rao State Level Talent Search and Sir CV Raman Talent Search Competitions for three consecutive years

(Appendix follows)

Publications

Arxiv: Reid M, ..., Jonnalagadda SR, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. 2024. arXiv preprint arXiv:2403.05530
Arxiv: Lee J, ..., Jonnalagadda SR, et al. Gecko: Versatile text embeddings distilled from large language models. 2024. arXiv preprint arXiv:2403.20327
Accepted Patent: Biswas A, ..., Jonnalagadda SR, et al. Multi-session context. U.S. Patent No. 11,908,463. 2024.
Accepted Patent: Krishnan P, Mandal A, Jonnalagadda SR, et al. Dialog Management for Multiple Users. US Patent 11,908,468. 2024
Accepted Patent: Jonnalagadda SR, et al. Systems and Methods for Improved Automated Conversations with Attendant Actions. U.S. Patent No. 11,551,188. 2023.
Accepted Patent: Terry GA, Koepf W, Jonnalagadda SR, et al. Systems and Methods for Training Machine Learning Models using Active Learning.U.S. Patent No. 11,663,409. 2023.
Conference: Xu J, Jonnalagadda SR, Durrett G. Massive-scale Decoding for Text Generation using Lattices. 2022. NAACL
Workshop: Beygi S, … Jonnalagadda SR. Logical Reasoning for Task Oriented Dialogue Systems. 2022. ECNLP workshop hosted at ACL
Accepted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Natural Language Processing and Classification. U.S. Application No. 16/019,382 filed June 26, 2018
Journal: Zolnoori M, …, Jonnalagadda SR, …, Topaz M. Audio recording patient-nurse verbal communications in home health care settings: Pilot feasibility and usability study. 2022. JMIR Human Factors.
Accepted Patent: Terry A, …, Jonnalagadda SR. Systems and methods for configurable messaging response-action engine. U.S. Patent No. 11,106,871. 2021.
Accepted Patent: Terry A, …, Jonnalagadda SR. Systems and methods for configurable messaging with feature extraction. U.S. Patent No. 11,100,285. 2021.
Accepted Patent: Terry A, …, Jonnalagadda SR. Systems and Methods for Automated Question Response. U.S. Patent No. 11,010,555. 021.
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Automated Buying Assistant with Matching and Confidence Generation. CVSC-19N3-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Classification Model Deployment in Dynamic Messaging Systems. CVSC-19N2-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Generating Custom Client Intents in AI Conversation Systems. CVSC-19N1-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Feature Deployment for a Transactional Assistant. Proposed Spinoff CVSC-19M2-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Response Processing using AI Models for a Transactional Assistant. CVSC-19M1-US. 2020.
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Exchange State Transitions for a Transactional Assistant. CVSC-19L-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Model Tuning using Classification Prediction in AI Conversation Systems. CVSC-18K4-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Question Response Integration in AI Conversation Systems. CVSC-18K3-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Trend Visualization in AI Conversation Systems. CVSC-18K2-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Multiple Language Processing in AI Conversation Systems. CVSC-18J3-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Response Annotation for AI Conversation Systems. CVSC-18J2-US. 2020
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Phantom Scraping for Populating AI Conversation Systems. CVSC-18H4-US. 2020
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Response Time Acceleration for AI Conversation Systems. CVSC-18H3-US. 2020
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Ensuring Data Fidelity in AI Conversation Systems. CVSC-18H2-US. 2020
Submitted Patent: Brigham B, Jonnalagadda SR, et al. Systems and Methods for Message Processing for Mapping Intents of Language to Meaning. CVSC-18G3-US. 2020
Submitted Patent: Brigham B, Jonnalagadda SR, et al. Systems and Methods for Statement and Question Recommendations with Intent Association for Conversation Builder. CVSC-18G2-US. 2020
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Response Generation in Conversations Between Targets and AI Messaging Systems. CVSC-18F3-US. 2019
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Action Execution used in Conversations Between Targets and AI Messaging Systems. CVSC-18F2-US. 2019
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Automated Conversations with Feedback Systems, Tuning, and Context-driven Training. U.S. Application No. 16/728,991 filed December 27, 2019
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Improved Automated Conversations with Intent and Action Response Generation. U.S. Application No. 16/726,879 filed December 25, 2019
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Improved Automated Conversations with ROI Metrics and Threshold Analysis. U.S. Application No. 16/718,038 filed December 17, 2019
Submitted Patent: Brigham B, Jonnalagadda SR, et al. Systems and Methods for Improved Automated Conversations. U.S. Application No. 16/723,735 filed December 20, 2019
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Enhanced Natural Language Processing for Machine Learning Conversations. U.S. Application No. 16/365,668 filed March 26, 2019
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Phrase Selection for Machine Learning Conversations. U.S. Application No. 16/365,665 filed March 26, 2019
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Message Building for Machine Learning Conversations. U.S. Application No. 16/365,663 filed March 26, 2019
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Improving User Engagement in Machine Learning Conversation Management using Gamification. U.S. Application No. 16/228,723 filed December 20, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Configuring Message Exchanges in Machine Learning Conversations. U.S. Application No. 16/228,721 filed December 20, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for using Natural Language Instructions with an AI Assistant Associated with Machine Learning Conversations. U.S. Application No. 16/228,717 filed December 20, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Training and Auditing AI Systems in Machine Learning Conversations. U.S. Application No. 16/228,712 filed December 20, 2018
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Multi-language Automated Action Response. U.S. Application No. 16/208,488 filed December 3, 2018
Submitted Patent: Jonnalagadda SR, et al. Systems and Methods for Generating and Updating Machine Hybrid Deep Learning Models. U.S. Application No. 16/208,478 filed December 3, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Message Cadence Optimization. U.S. Application No. 16/168,779 filed October 23, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Configurable Messaging with Feature Extraction. U.S. Application No. 16/168,763 filed October 23, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for Configurable Messaging Response-Action Engine. U.S. Application No. 16/168,737 filed October 23, 2018
Submitted Patent: Terry A, Jonnalagadda SR, et al. Systems and Methods for a Communication Editor Dashboard. U.S. Application No. 16/129,722 filed September 12, 2018
Journal: Stanley Swat, David Cohen, Sanjiv Shah, Donald Lloyd-Jones, Abigail Baldridge, Benjamin Freed, Esther Vorovich, Clyde Yancy, Siddhartha Jonnalagadda, Stuart Prenner, Daniel Kim, and Jane Wilcox. Baseline Longitudinal Strain Predicts Recovery of Left Ventricular Ejection Fraction in Hospitalized Patients with Non-ischemic Cardiomyopathy. Journal of the American Heart Association. 2018; 7.20 (2018): e09841
Journal: Bian J, Morid MA, Jonnalagadda SR, Luo G, Del Fiol G. Automatic identification of high impact articles in PubMed to support clinical decision making. Journal of Biomedical Informatics. 2017; DOI: https://doi.org/10.1016/j.jbi.2017.07.015
Journal: Luo Y, Thompson WK, Herr TM, Zeng Z, Berendsen MA, Jonnalagadda SR, Carson MB, Starren J. Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review. Drug Safety. 2017; 1-15
Journal: Jonnalagadda SR, Adupa A, Garg R, Corona-Cox J, Shah SJ. Text Mining of the Electronic Health Record: An Information Extraction Approach for the Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials. Journal of Cardiovascular Translational Research. 2017; 10.1007/s12265-017-9752-2
Journal: Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda SR. Extractive text summarization system to aid data extraction from full text in systematic review development. Journal of Biomedical Informatics. 2016;64:265-272
Journal: Wongchaisuwat P, Klabjan D, Jonnalagadda SR. A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism. JMIR Medical Informatics. 2016; 4(3): e24.
Journal: Nath C, Albaghdadi M, Jonnalagadda SR. A Natural Language Processing Tool for Large-scale Data Extraction from Echocardiography Reports. PLOS ONE. 2016; 11(4), p.e0153749.
Journal: Bui DDA, Del Fiol G, Jonnalagadda S. PDF text classification to leverage information extraction from publication reports. Journal of Biomedical Informatics. 2016; 61, pp.141-148.
Journal: Nath C, Huh J, Adupa A, Jonnalagadda SR. Website Sharing in Online Health Communities: A Descriptive Analysis. Journal of Medical Internet Research. 2016; 18(1):e11. DOI: 10.2196/jmir.5237
Journal: Morid MA, Fiszman M, Raja K, Jonnalagadda SR, Del Fiol G. Classification of Clinically Useful Sentences in Clinical Evidence Resources. Journal of Biomedical Informatics. 2016; DOI: http://dx.doi.org/10.1016/j.jbi.2016.01.003 (selected for IMIA Yearbook)
Conference: Raja K, Dasot N, Goyal P, Jonnalagadda SR. Towards Evidence-Based Precision Medicine: Extracting Population Information from Biomedical Text using Binary Classifiers and Syntactic Patterns. AMIA Clinical Research Informatics Summit. 2016; Accepted.
Journal: Mutharasan RK, Kansal P, …, Jonnalagadda S, …, Yancy CW. Hospitalized Heart Failure Incidence is Significantly Underestimated by Diagnosis-Related Group Codes. Journal of Cardiac Failure 22 (8), S133. 2016.
Arxiv: Basu T, Kumar S, Kalyan A, Jayaswal P, Goyal P, Pettifer S, Jonnalagadda SR. A novel framework to expedite systematic reviews by automatically building information extraction training corpora. arXiv preprint arXiv:1606.06424. 2016
Arxiv: Kalyan A, Garg R, Corona-Cox J, Shah S, Jonnalagadda SR. An Information Extraction Approach to Prescreen Heart Failure Patients for Clinical Trials. arXiv preprint arXiv:1609.01594. 2016
Arxiv: Garg R, Raja K, Jonnalagadda SR. CRTS: A type system for representing clinical recommendations. arXiv preprint arXiv:1609.01592. 2016
Arxiv: Putta P, Dzak III J, Jonnalagadda SR. Automatically extracting, ranking and visually summarizing the treatments for a disease . arXiv preprint arXiv:1609.01574. 2016
Arxiv: Raja k, Sauer A, Garg R, Klerer M, Jonnalagadda SR. A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical Knowledge Summarization: Combining Concept Extraction, Vector Similarity and Query Expansion for High Precision. arXiv preprint arXiv:1609.01597. 2016
Arxiv: Garg R, Dong, S, Shah S, Jonnalagadda SR. A bootstrap machine learning approach to identify rare disease patients from electronic health records. arXiv preprint arXiv:1609.01586. 2016
Arxiv: Dong S, Mutharasan RK, Jonnalagadda SR. Using Natural Language Processing to Screen Patients with Active Heart Failure: An Exploration for Hospital-wide Surveillance. arXiv preprint arXiv:1609.01580. 2016
Journal: Jonnalagadda SR, Goyal P, Huffman MD. Automating Data Extraction in Systematic Reviews: A Systematic Review. Systematic Reviews. 2015;4:78 doi:10.1186/s13643-015-0066-7. PMCID: PMC4514954
Journal: Bui DDA, Jonnalagadda S, Del Fiol G. Automatically finding relevant citations for clinical guideline development. Journal of Biomedical Informatics. 2015;57:436-445
Journal: Del Fiol G, Mostafa J, Pu D, Medlin R, Slager S, Jonnalagadda S, Weir CR. Formative evaluation of a patient-specific clinical knowledge summarization tool. International Journal of Medical Informatics. 2015; ;86:126-34 PMCID: PMC4701588
Journal: Cormack J, Nath C, Milward D, Raja K, Jonnalagadda SR. Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge. Journal of Biomedical Informatics. 2015; Jul 22. pii: S1532-0464(15)00141-0. doi: 10.1016/j.jbi.2015.06.030.
Conference: Morid MA, Jonnalagadda SR, Fiszman M, Raja K, Del Fiol G. Classification of Clinically Useful Sentences in MEDLINE. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2015.
Book Chapter: Raja K, Jonnalagadda S. Natural Language Processing and Data Mining for Clinical Text. In: Healthcare Data Analytics, published by CRC Press, USA. 2015.
Abstract: Stöber J, Heale BSE, Kim H, Fulghum K, Raja K, Del Fiol G, Jonnalagadda SR. Concept based Information Retrieval for Clinical Case Summaries. TREC Clinical Decision Support Track. 2015. [Stood second in Task A and selected for presentation at the track meeting]
Abstract: Nath C, Albaghdadi MS, Jonnalagadda SR. Automated Approach to Extract Cardiovascular Phenotypes from Echocardiography Reports. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2015.
Abstract: Raja K, Sauer AJ, Klerer MR, Jonnalagadda SR. Automated Citation Retrieval System for Clinical Knowledge Management. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2015.
Abstract: Cohen DG, Freed BH, Andrei AC, Jonnalagadda S, Lloyd-Jones DM, Wilcox J. Does Left Ventricular Strain Predict Outcomes Among Patients with Nonischemic Cardiomyopathy Hospitalized for Heart Failure? Annual Scientific Sessions of American Society of Echocardiography (ASE). 2015
Abstract: Wilcox J, Andrei AC, Jonnalagadda S, Lloyd-Jones DM. Myocardial Recovery After Initial Hospitalization for Heart Failure in Patients with Non-ischemic Cardiomyopathy. Circulation 130.Suppl 2 (2014): A14671-A14671 (Presented at 2014 American Heart Association’s Scientific Sessions)
Abstract: Del Fiol G, Pu D, Weir CR, Medlin R, Jonnalagadda S, Mishra R, Slager S, Mostafa J. Iterative design of an Interactive Clinical Evidence Summarization Tool. Workshop on Interactive Systems in Healthcare (WISH). Washington, DC. 2014. [Honorable Poster Award]
Conference: Jonnalagadda SR, Moosavinasab S, Nath C, Li D, Chute CG, Liu H. An Automated Approach for Ranking Journals to Help in Clinician Decision Support. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2014. PMCID: PMC4420004
Journal: Mishra R, Bian J, Fiszman M, Weir C, Jonnalagadda S, Mostafa J, Del Fiol G. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research. Journal of Biomedical Informatics. 2014. PMCID: PMC4261035
Conference: Moosavi S, Rastegar M, Liu H, Jonnalagadda S. Towards Transforming Expert-based Content to Evidence-based Content. AMIA Clinical Research Informatics Summit 2014. PMCID: PMC4419763
Conference: Mishra R, Del Fiol G, Kilicoglu H, Jonnalagadda S, Fiszman M. Automatically Extracting Clinically Useful Sentences from UpToDate to Support Clinicians’ Information Needs. AMIA Annual Symposium 2013. [Best student paper finalist]. PMCID: PMC3900230
Journal: Wu S, Sohn S, Ravikumar KE, Wagholikar K, Jonnalagadda S, Liu H, Juhn Y. Automated chart review for asthma cohort identification using natural language processing: an exploratory study. Annals of Allergy, Asthma & Immunology. 2013; PMCID: PMC3839107
Journal: Jonnalagadda S, Cohen T, Wu S, Liu H, Gonzalez G. Using Empirically Constructed Lexical Resources for Named Entity Recognition. Biomedical Informatics Insights. 2013; PMCID: PMC3702195.
Journal: Zhang M, Del Fiol G, Grout R, Jonnalagadda S, Medlin Jr R, Mishra R, Weir C, Liu H, Mostafa J, Fiszman M. Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs. Studies in health technology and informatics. 2013;192:846-50. [Best student paper finalist @MEDINFO] PMCID: PMC3940695
Conference: Liu H, Bielinski S, Sohn S, Murphy S, Wagholikar K, Jonnalagadda S, Ravikumar K.E., Wu S, Kullo I, Chute C. An Information Extraction Framework for Cohort Identification Using Electronic Health Records. AMIA Clinical Research Informatics Summit 2013. PMCID: PMC3845757
Journal: Sohn S, Wagholikar KB, Li D, Jonnalagadda S, Tao C, Komandur-Elayavilli R, Liu H. Comprehensive Temporal Information Detection from Clinical Text: Medical Events, Time, and TLINK Identification. Journal of American Medical Informatics Association. 2013; PMCID: PMC3756269
Journal: Sohn S, Clark C, Halgrim S, Murphy S, Jonnalagadda S, Wagholikar K, Wu S, Chute C, Liu H. Analysis of Cross-Institutional Medication Information Annotation in Clinical Notes. Biomedical Informatics Insights. 2013; PMCID: PMC3702197
Journal: Wagholikar K, Torii M, Jonnalagadda S, Liu H. Pooling annotated corpora for clinical concept extraction. Journal of Biomedical Semantics, 2013; 4:3. doi: 10.1186/2041-1480-4-3 PMCID: PMC3599895
Workshop: Jonnalagadda S, Moosavinasab S, Li D, Abel M, Chute C, Liu H. Prioritizing journals relevant to a topic for addressing clinicians’ information needs. Proceedings of the International Workshop on Biomedical and Health Informatics in Conjunction with IEEE International Conference on Bioinformatics and Biomedicine. 2013.
Workshop: Ravikumar KE, Li D, Jonnalagadda S, Wagholikar K, Xia N, Liu H. An ensemble approach for chemical entity mention detection and indexing. BioCreative Challenge Evaluation Workshop vol. 2 (p. 140). 2013.
Conference: Li D, Liu H, Chute C, Jonnalagadda S. Towards Assigning References Using Semantic, Journal and Citation Relevance. IEEE International Conference on Bioinformatics and Biomedicine. 2013.
Workshop: Liu H, Wagholikar K, Jonnalagadda S, Sohn S. Integrated cTAKES for Concept Mention Detection and Normalization. CLEF 2013 Evaluation Labs and Workshop. 2013.
Workshop: Jonnalagadda S, Cohen T, Wu S, Liu H, Gonzalez G. Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition. Computational Semantics in Clinical Text (CSCT) workshop, Potsdam, Germany. 2013.
Book Chapter: Jonnalagadda S, Topham P, Silverman E, Peeler R. Scientific collaboration networks using biomedical literature. In: Biomedical Literature Mining, Methods in Molecular Biology series published by Humana Press, USA. 2013.
Abstract: Prioritize journals relevant to a clinical topic: A survey of US cardiologists about heart failure. AMIA Annual Symposium, Washington, D.C., November 2013.
Journal: Jonnalagadda S, Del Fiol G, Medlin R, Weir C, Fiszman M, Mostafa J, Liu H. Automatically Extracting Sentences from Medline Citations to Support Clinicians’ Information Needs. Journal of American Medical Informatics Association. 2013; 20:995-1000 Published Online First: 25 October 2012 doi:10.1136/amiajnl-2012-001347 [Editor’s Choice]. PMCID: PMC3756259
Conference: Liu H, Wu S, Li D, Jonnalagadda S, Sohn S, Wagholikar K, Haug P, Huff S, Chute C. Towards a semantic lexicon for clinical natural language processing. Annual Proceedings of American Medical Informatics Association. 2012;2012:568-76. PMCID: PMC3540492.
Journal: Jonnalagadda S, Petitti D. A New Iterative Method to Reduce Workload in Systematic Review Process. International Journal of Computational Biology and Drug Design. 2013;6(1-2):5-17. PMCID: PMC3787693
Journal: Jonnalagadda S, Li D, Sohn S, Wu S, Wagholikar K, Torii M, Liu H. Coreference Analysis in Clinical Notes: A Multi-Pass Sieve With Alternate Anaphora Resolution Modules. Journal of American Medical Informatics Association. Published Online First: 16 June 2012 doi:10.1136/amiajnl-2011-000766. PMCID: PMC3422831
Journal: Jonnalagadda S, Peeler R, Topham P. Discovering opinion leaders for medical topics using news articles. Journal of Biomedical Semantics. 2012; 3 (1), 2. PMCID: PMC3338075 [Highly Accessed]
Conference: Wagholikar K, Torii M, Jonnalagadda S, Liu H. Feasibility of pooling annotated corpora for clinical concept extraction. AMIA Clinical Research Informatics Summit, 2012. 2012:38. PMCID: PMC3392069
Journal: Jonnalagadda S, Cohen T, Wu S, Gonzalez G. Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics. 2012, 45(1):129-140. PMCID: PMC3272090
Conference: Emadzadeh E, Jonnalagadda S, Gonzalez G. Evaluating Distributional Semantic and Feature Selection for Extracting Relationships from Biological Text. Proceedings of the International Conference on Machine Learning Applications. 2011.
Accepted Patent: Topham P, Jonnalagadda S. Extracting and Normalizing Organization Names from Text. US Patent 8370361 B2. 2011
Conference: Jonnalagadda S, Gonzalez G. BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction. Annual Proceedings of American Medical Informatics Association. 2010. PMCID: PMC3041388
Journal: Jonnalagadda S, Topham P. NEMO: Extraction and normalization of organization names from PubMed affiliation strings. Journal of Biomedical Discovery and Collaboration. 2010; 5:50-75. PMCID: PMC2990275.
Conference: Jonnalagadda S, Leaman R, Cohen C, Gonzalez G. A distributional semantics approach to simultaneous recognition of multiple classes of named entities. Computational Linguistics and Intelligent Text Processing/Lecture Notes in Computer Science. 2010; 6008/2010:224-235.
Journal: Hakenberg J, Leaman R, Vo N, Jonnalagadda S, Sullivan R, Miller C, Tari L, Baral C, Gonzalez G. Efficient extraction of protein-protein interactions from full-text articles. IEEE/ACM Transactions on Computational Biology and BioInformatics. 2010. PMID: 20498514
Workshop: Jonnalagadda S, Gonzalez G. Can distributional statistics aid clinical concept extraction? Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: i2b2, 2010.
Conference: Jonnalagadda S, Tari L, Hakenberg J, Baral C, Gonzalez G. Towards effective sentence simplification for automatic processing of biomedical text. Proceedings of the North American Association of Computation Linguistics and Human Language Technology. 2009.
Workshop: Jonnalagadda S, Topham P, Gonzalez G. Towards automatic extraction of social networks of organizations in PubMed abstracts. Proceedings of the International Workshop on Graph Techniques for Biomedical Networks in Conjunction with IEEE International Conference on Bioinformatics and Biomedicine. 2009.
Conference: Jonnalagadda S, Topham P, Gonzalez G. ONER: Tool for organization named entity recognition from affiliation strings in PubMed abstracts. Proceedings of the International Symposium on Languages in Biology and Medicine. 2009.
Conference: Jonnalagadda S, Gonzalez G. Sentence simplification aids protein-protein interaction extraction. Proceedings of the International Symposium on Languages in Biology and Medicine. 2009.
Workshop: Hakenberg J, Leaman R, Vo N, Jonnalagadda S, Sullivan R, Miller C, Tari L, Baral C, Gonzalez G. Online protein interaction extraction and normalization at Arizona State University. Proceedings of the BioCreative II.5 Workshop. 2009.

Selected Presentations

Invited talks

Logical Reasoning for Task Oriented Dialogue Systems. NU Spark. 2021 (Best Project Award)
Alexa as a Friend. Alexa Principal Technologists Meeting. June 2021.
A Collaborative Approach to Conversational AI for Advancing Individual Disciplines and Social Impact. Alexa Prize Summit. April 2021
Lessons Learned and Thoughts on Advancement of AI. Gratifi Summit. July 2019.
Tutorial: NLP with Deep learning. Global AI Conference. April 2019
Future of AI. Global AI Conference. April 2019
Tutorial: NLP with Deep learning. Samsung R&D. April 2019
Future of AI. Samsung R&D. April 2019
Tutorial: NLP with Deep learning. [24]7.ai. April 2019
Future of AI. [24]7.ai. April 2019
Tutorial: NLP with Deep learning. IIT Hyderabad. March 2019
Future of AI. IIT Hyderabad. March 2019
Tutorial: NLP with Deep learning. IIT Ropar. March 2019
Future of AI. IIT Ropar. March 2019
Tutorial: NLP with Deep learning. IIT Kharagpur. March 2019
Future of AI. IIT Kharagpur. March 2019
Tutorial: NLP with Deep learning. NIT Warangal. March 2019
Future of AI. NIT Warangal. March 2019
Lectures on NLP and Deep Learning. UC Berkeley. 2018 - present
Deep Reinforcement Learning for Conversational AI. AVIOS Conversational Interaction Conference. March 2019
Deep reinforcement learning: How to avoid the hype and make it work for you. O’Reilly Artificial Intelligence Conference. London. October 2018
Showcase Company: Conversica, Venture Beat Summit: Transform - Leveraging AI & analytics for growth. August 2018
Conversational AI: What We’ve Learned from Millions of AI Conversations for Thousands of Customers. Global Big Data Conference. August 2018
Conversational AI: What We’ve Learned from Millions of AI Conversations for Thousands of Customers. O’Reilly Artificial Intelligence Conference. New York. May 2018
Conversational AI: What We’ve Learned from Millions of AI Conversations for Thousands of Customers. Global AI Conference. April 2018
Keynote Panel Moderator: AI from Investment perspective. Global AI Conference. April 2018
Keynote Panel: The Future of Artificial Intelligence. Global AI Conference. April 2018
Conversational AI for Digital Assistants. AVIOS Conversational Interaction Conference. February 2018
Natural Language Processing Exposed: The Art, the Science and the Applications. BrightTalk Webinar. December 2017.
Conversational AI – Basics and Design Considerations: Lessons learned from Millions of AI Conversations for Thousands of Customers. AI Expo. November 2017
Using Artificial Intelligence, Machine Learning and Natural Language Processing to Solve Real World Business Problems at Scale. Einstein AI, Deep Learning & SuperIntelligence Conference. November 2017
Innovation Showcase: Top 5 Most Promising AI Technologies, Venture Beat Summit: Riding the AI Wave. October 2017
Conversational AI: What we’ve learned from millions of AI conversations for thousands of customers, Keynote at AI with the Best. October 2017
Natural Language Processing and Machine Learning Applications in Biomedicine, Microsoft. May 2016.
Natural Language Processing Algorithms in Healthcare, Commercialization Clinic for Health IT, Northwestern University Innovation and New Ventures Office. March 2016.
Novel Biomedical Natural Language Processing Algorithms. Institute for Health Research and Policy, University of Illinois at Chicago. February 2016.
Natural Language Processing and Machine Learning for Biomedicine. IBM Thomas J. Watson Research Centre. February 2016.
Natural Language Processing for Systematic Reviews. Cochrane Heart Group US Satellite Anniversary Meeting, Northwestern University. December 2015.
Harnessing information from biomedical text through natural language processing. Cancer Control and Survivorship (CCS) Seminar, Lurie Cancer Centre. December 2015.
Connectivity with subject matter experts as well as downstream applications: NLP for clinical trial eligibility screening. AMIA NLP Working Group Pre-Symposium, San Francisco, CA. November 2015.
Supporting the information needs of clinicians, researchers, and patients through biomedical text mining. Institute for Public Health and Medicine Thursdays at Noon seminar, Northwestern University. May 2015.
Information Extraction from Clinical Narratives: An Introduction and Current Capabilities. NM EDW Research Forum, Northwestern University. February 2015.
Using Text Mining for Reducing the Manual Effort in Systematic Review Process. Cochrane Heart Group US Satellite 1-Year Anniversary Meeting, Northwestern University. November 2014.
An Automated Approach for Ranking Journals to Help in Clinician Decision Support. AMIA Annual Symposium, Washington DC. November 2014
Biomedical Natural Language Processing: Introduction and Examples. Cardiovascular Epidemiology Seminar Series, Northwestern University. October 2014.
Text Mining: Introduction and Biomedical Examples. Joint BOB Cores Seminar Series: Sponsored by the Biostatistics Core, The Outcomes Measurement and Survey Core and the Bioinformatics Core of the Robert H. Lurie Comprehensive Cancer Centre, Northwestern University, March 2014.
Towards Transforming Expert-based Content to Evidence-based Content. AMIA Clinical Research Informatics Summit, San Francisco, CA. 2014.
Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition. Computational Semantics in Clinical Text (CSCT) workshop, Potsdam, Germany. 2013.
Automatically Extracting Sentences from Medline Citations to Support Clinicians’ Information Needs. Topics of Interest Cyberseminar, Veteran Affairs Health Services Research and Development Service, January 2013.
Using information retrieval, extraction and management methods for clinical question answering and decision support. Division of Health and Biomedical Informatics, Northwestern University, December 2012.
Automatically Extracting Sentences from Medline Citations to Support Clinicians’ Information Needs. IEEE Healthcare Informatics, Imaging, and Systems Biology Conference, UCSD, La Jolla, CA. September 2012
MedTagger: A Fast NLP Pipeline for Indexing Clinical Narratives. SHARPn Summit 2012, Rochester, MN. June 2012
A new iterative method to reduce workload in systematic review process. International Conference on Intelligent Biology and Medicine, Nashville, TN. March 2012
Coreference analysis in clinical notes: A multi-pass sieve with alternate anaphora resolution modules. i2b2/VA NLP shared task workshop, Washington DC. October 2011
A perspective on biomedical information retrieval, extraction and management. Weekly seminar, Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Centre. May 2011
A novel approach to reduce workload in systematic review. BMI Annual Symposium, Arizona State University. May 2011
Biomedical concept extraction without dictionaries. Biomedical Informatics Research Symposium, Stanford University. April 2011
Can distributional statistics aid clinical concept extraction? i2b2/VA NLP shared task workshop, Washington DC. November 2010
BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction. AMIA Annual Symposium, Washington DC. November 2010
Sentence simplification aids protein-protein interaction extraction. The 3rd International Symposium on Languages in Biology and Medicine, Jeju Island, South Korea. November 8-10, 2009
Towards automatic extraction of social networks of organizations in PubMed abstracts. First International Workshop on Graph Techniques for Biomedical Networks in Conjunction with IEEE International Conference on Bioinformatics and Biomedicine, Washington DC, USA. Nov. 1-4, 2009

Service

Scientific POC for Amazon Research Awards to UC Davis and UT Austin.
Core team member of Alexa Prize
Founding core team member of Natural Turn Taking
Member, SAIF Scientific Advisory Committee, ML Compute/Data Infr, Amazon
Scientific Publication Approver, Amazon
Grant reviewer, National Institutes of Health
PC Member
- Empirical Methods in Natural Language Processing
- ACM Seventh International Workshop on Data and Text Mining in Biomedical Informatics
- ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics
- International Workshop on Mining Unstructured Big Data using Natural Language Processing
- International Workshop on Big Data, Streams and Heterogeneous Source Mining
- International Conference on Social Informatics
- Graph-based Methods for Natural Language Processing (TextGraphs)
- American Medical Informatics Association Annual Symposium
- American Medical Informatics Association Joint Summits on Translational Science
- Workshop on Data Mining in Biomedical informatics and Healthcare (In conjunction with the Pacific-Asia Conference on Knowledge Discovery and Data)
- The Second International Conference on Global Health Challenges
- IEEE International Conference on Health Informatics
- Data Mining for Medical Informatics: Predictive Analytics
- Workshop on Sentiment Analysis where AI meets Psychology (SAAIP)
- International Conference on Intelligent Biology and Medicine.
Reviewer
- Artificial Intelligence in Medicine (AIIM)
- Journal of Knowledge and Information Systems
- Journal of Medical Internet Research (JMIR)
- PLOS ONE
- Journal of American Medical Informatics Association (JAMIA)
- International Journal of Medical Informatics (IJMI)
- Journal of Biomedical Informatics (JBI)
- The Journal of Biological Databases and Curation (DATABASE)
- Biomedical Informatics Insights (BII)
- Journal of Biomedical Semantics (JBMS)
- PLOS Computational Biology
- Applied Clinical Informatics
- BMC Bioinformatics
- Methods of Information in Medicine
- International Journal of Advanced Computer Science and Applications
- Letters in Drug Design & Discovery
- AMIA Annual Symposium
- AMIA Joint Summits on Translational Science
- Intelligent Systems for Molecular Biology (ISMB)
- i2b2/VA/Cincinnati NLP Shared Task Symposium
- International Conference on Machine Learning and Applications
- Pacific Symposium on Biocomputing (PSB)
- IEEE Healthcare Informatics, Imaging and Systems Biology
- European Conference on Computational Biology (ECCB)
- IEEE International Conference on Health Informatics (ICHI)
- ACM Seventh International Workshop on Data and Text Mining in Biomedical Informatics
Reviewer, Data Science Initiative Program, Northwestern University.
Secretary, Special Committee on Faculty Salary and Benefits, Northwestern University
Panelist, Privacy in Biomedical Data Analysis: Ethical, legal and technological Perspectives, iDASH Privacy workshop
Co-organizer for the semi-monthly BMI webinar, Section of Medical Informatics, Mayo Clinic
Project Management Committee (PMC) Member, Apache cTAKES

Research Support

Using speech and language to identify patients at risk for hospitalizations and emergency department visits in homecare

Columbia Center of Artificial Intelligence Technology, in collaboration with Amazon

This study is the first step in exploring an emerging and previously understudied data stream - verbal communication between healthcare providers and patients. Specifically, we a) record patient-nurse communications and extract information providing clues for patient risk identification, using conventional feature extraction or end-to-end machine learning models; b) extract the terms and expressions indicating clinical risk factors, lifestyle risk factors and clinical interventions associated with risk of hospitalization and emergency department visits; c) develop machine learning models which combine information extracted from audio-recorded communications and from patient medical records to estimate the risk of emergency department visit and hospitalization.

Role: Co-Investigator

Deep Reasoning for Dialog (Natural Understanding Spark Initiative)

Amazon

After reviewing 139 org-wide proposals, four proposals have been selected for the inaugural cycle of NU Spark. In this work we propose to build on recent advances in research on logical reasoning and deep networks to bring reasoning capabilities to our dialog systems. With the vision of “enabling world-class AI-driven experiences” and “inventing the future of conversational AI”, we want to build conversational systems that can take information that is known, unite it with existing knowledge, make inferences about information that is unknown or uncertain in order to address a customer’s need.

Role: Co-Investigator

Meeting Clinician’s Information Needs with Highly Tailored Knowledge Summaries (4 years)

NIH R01

The goal of this research is to extract knowledge summaries automatically to assist physicians in making the best patient-care decisions. Specifically, this involves information retrieval, information extraction and multi-document summarization, in which I have significant expertise.

Role: Consortium PI

Utilizing Electronic Health Records to Measure and Improve Prostate Cancer Care (2 years)

NIH R01

We propose to assemble a robust data-mining workflow to efficiently and accurately capture treatment and outcome quality metrics from structured data and free-text in EHRs. We will put this evidence in the hands of both clinicians and patients through a web-based risk assessment tool.

Role: Consortium PI

Genomic Medicine at Northwestern: Discovery and Implementation (4 years)

NIH U01

The Electronic Medical Records and Genomics (eMERGE) Network is a National Institutes of Health (NIH)-organized and funded consortium of U.S. medical research institutions. The Network brings together researchers with a wide range of expertise in genomics, statistics, ethics, informatics, and clinical medicine from leading medical research institutions across the country to conduct research in genomics, including discovery, clinical implementation and public resource.

Role: Collaborator

Mining Unstructured Cardiovascular Data in Electronic Medical Records (2 years)

Bluhm Cardiovascular Institute

In this project, we are building NLP algorithms to better identify patients for clinical trial enrolment and improve quality of care. We will be extracting cardiovascular information from narratives such as attending notes and discharge summaries.

Role: PI

Improving the Efficiency and Efficacy in Authoring Essential Clinical FAQs (4 years)

NIH R00

It has been observed that point of care access to relevant clinical knowledge supports decision-making, decreases medical errors, improves patient safety and reduces healthcare costs. This project aims to empower physicians specialized in the area (specialists) in quickly gathering evidence from literature or finding citations supporting or qualifying their expert opinion. It will also generate the answers and suggest updates to the existing answers for their perusal.

Role: PI

Natural Language Processing to Enhance Screening for PARAGON Clinical Trial (1 year)

Novartis Pharmaceuticals Corporation

This research project aims focus on the development, programming, testing, and validation of a natural language processing (NLP)-based software that may result in a three-fold increase in the number of enrolled patients at sites where it is used. We are studying whether using an NLP-based algorithm allows clinicians across multiple sites to more efficiently identify patients matching the set of inclusion-exclusion criteria for the Novartis PARAGON-HF clinical trial.

Role: Co-PI

Machine Learning Algorithms to Reduce Errors in Clinical Workflows (1 year)

Baxter Healthcare

Molecular Determinants of Hypertensive HFpEF: Genomics, Transcriptomics, and Proteomics ( 1 year)

American Heart Association

Role: Co-Investigator

Pharmacovigilance Pilot (1 year)

Abbvie

Role: Co-Investigator

NLP for AskMayoExpert (Mayo Clinic’s Clinical Knowledge System) (2 years)

Mayo Clinic

Built a system for automatically answering clinical questions by analysing the question, extracting information from trusted sources, and summarizing the text answer.

Role: PI

Pilot work for use of NLP in Clinical Decision Support (1 year)

University of Utah

Using Continuity of Care Documents with text semantics (such as the predication database from Semantic Medline), we created search strategies to retrieve and summarize content from a set of popular knowledge resources. This work focused on patients with multiple chronic conditions.

Role: PI

Natural language processing for biological knowledge management (2 years)

National Science Foundation

Role: NLP Collaborator

Text mining clinical notes: porting semi-supervised techniques from biomedical literature mining (2 years)

NIH Contract

The goal of this research was to investigate novel information extraction approaches using distributional semantics and sentence simplification. Specifically, we extracted mentions of problems and treatments and relations between them as found in clinical narratives.

Role: Consortium PI