The project aims to demonstrate the use of Natural Language Processing to search for relevant clinical information from doctors’ clinical notes.

  • Implement natural language processing pipeline with Scala and cTAKES, a library with both rule-based and machine learning techniques, to extract clinical information from unstructured medical text
  • Develop big data pipeline with Scala and Spark to process Terrabytes of clinical data, extract clinical information and store them in Accumulo (HBase with cell level security)

Sample clinical note:

   

Sample clinical information extracted (Red indicates incorrect extraction):