Natural Language Processing Data Scientist

Passionate about making a difference in the world of cancer genomics?

With the advent of genomic sequencing, we can finally measure and process our genetic makeup. We now have more data than ever before but providers don't have the infrastructure or expertise to make sense of said data, let alone use their extensive patient charting to complement the data achieved through genome sequencing. Here at Tempus, we believe that the wholistic approach for the detection and treatment of cancer lies in the deep understanding of molecular activity coupled with the ability to use the latest NLP and predictive modeling techniques to extract information from the patient’s chart.

Our Natural Language Processing Data Scientists will use state of the art techniques to process and analyze vast amounts of clinical data in a manner that has never been done before. They’ll also help create a highly scalable infrastructure to house billions of records from the ground up. We’re looking for someone who will collaborate with product, research, and business development teams to build the world’s largest library of molecular and clinical data.  

What you'll do:

  • Help design and develop a novel bioinformatics platform with the capability of ingesting large unstructured clinical data sets to separate signal from noise and provide personalized insights at the patient level
  • Develop innovative methods for processing and storing data
  • Interrogate analytical results to resolve algorithmic success, robustness and validity


  • PhD, MS, or equivalent experience in NLP, Data Science, computer science, bioinformatics or related field
  • Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction
  • Experience with knowledge databases and language ontologies
  • Significant quantitative training in probability, statistics and machine learning
    • classical statistical tools, machine learning algorithms, ensemble methods
    • data exploration and visualization methods
  • Significant analytical development and programming skills
    • Python, R, Javascript, or Lua
    • Reproducible research methods
    • Visualization tools
  • Significant database familiarity
    • mySQL, NoSQL, Cassandra, MongoDB, Elasticsearch, HBase
  • Experience working in Linux and running tasks in a cluster environment
    • experience with cloud computing is a plus
  • Experience working with clinical documents
  • Experience in genomics is a plus, especially experience with next-generation sequencing data processing and modeling
  • Goal-oriented thinking and creative problem solving skills
  • Self-driven and works well in an interdisciplinary team with minimal direction
  • Experience with communicating insights and presenting concepts to a diverse audience of engineers, clinicians, laboratory scientists and business development professionals
