Clinical Text Extraction
The project aims to demonstrate the use of Natural Language Processing to search for relevant clinical information from doctors’ clinical notes.
- Implement natural language processing pipeline with Scala and cTAKES, a library with both rule-based and machine learning techniques, to extract clinical information from unstructured medical text
- Develop big data pipeline with Scala and Spark to process Terrabytes of clinical data, extract clinical information and store them in Accumulo (HBase with cell level security)
Sample clinical note:
Sample clinical information extracted (Red indicates incorrect extraction):