Research (old)

Describes Chris Tanner’s research interests in Machine Learning, Deep Learning, and Natural Language Processing (NLP). Harvard University. Brown University, Spotify, IBM Research, IBM Watson, Johns Hopkins HLT COE, MIT Lincoln Laboratory, MITLL, Department of Defense, Google, Florida Tech, UCLA.

Professional Timeline
christanner [at] seas [dot] harvard [dot] edu

christanner [at] seas [dot] harvard [dot] edu

PRESENT

Hi, I’m Chris Tanner.

I am a lecturer at Harvard’s Institute for Applied Computational Science (IACS), which centers around two Master’s programs: Data Science; and Computational Science and Engineering. I research and teach data science, machine learning, and natural language processing (NLP). I teach:

  • Deep Learning for Natural Language Processing (40 students): Coming Fall 2021! I’m excited to make a new course from scratch! Graduate course concerning language models, machine translation, transformers, many NLP tasks, and a significant research project.

  • Introduction to Data Science (390 students):
    EDA, visualization, regression, boosting, PCA, trees, neural networks

  • Advanced Data Science (230 students): GLMs, deep learning, CNNs, LSTMs, VAEs, GANs, Reinforcement Learning, Transformers.

  • Capstone course (40 students): Working with partner organizations, I craft real-world machine learning, data science research projects, then I advise and manage students as they work in teams to research, develop, and effectively communicate solutions.

If you’re interested in applying to Harvard IACS, please read my brief description of grad programs.

RESEARCH

My research lies within natural language processing (NLP), specifically discourse, semantics, and understanding. The persistent theme in my work is trying to better understand, within any body of text, what is being said, what exactly is happening, and who is who? Toward these goals, my current projects involve entity linking, knowledge graphs, American Sign Language (ASL) translation, and coreference resolution.

CURRENT PROJECTS

Sensors to Sign Language Classification (Ali Hindy, Thomas Fouts, Julia Kreutzer, and Chris Tanner). In Submission.

We built sensors, attached them to one’s arms, signed a corpus of ASL words, and developed a model that leverages a video corpus for out-of-vocabulary classification.

American Sign Language Corpus (Thomas Fouts, Ali Hindy, and Chris Tanner). Aiming for EMNLP.

We built sensors, attached them to one’s arms, and signed 1,000 unique ASL words while capturing both the sensor values (5 muscle sensors, (1) 6-axis gyroscope, (1) accelerometer) and video feed.

Toward Featureless Event Coreference Resolution via Conjoined Convolutional Neural Networks (Chris Tanner and Eugene Charniak). Aiming for EMNLP.

We developed SOTA results for event coreference on the ECB+ corpus, while using almost no features.

Symbiotic Coreference Resolution for Entities and Events (Ning Hua and Chris Tanner). Aiming for EMNLP.

We demonstrate a new approach for jointly performing entity and event coreference.

HUMBLE: An Annotation Suite for Lexical Grounding (Joe Brucker, Eduardo Peynetti, Shivas Jayaram, and Chris Tanner). Aiming for EMNLP.

In pursuit of building the biggest, best event coreference dataset to date.

End-to-end Entity Linking (Mingyue Wei and Chris Tanner). Aiming for EMNLP.

For her Master’s Thesis at Harvard, Mingyue is researching end-to-end entity linking.

Unsupervised Coreference Resolution (Alessandro Stolfo, Mrinmaya Sachan, Vikram Gupta, and Chris Tanner)

For his Master’s Thesis at ETH Zurich, Alessandro is researching unsupervised coreference resolution.

Bringing BERT to the field: Transformer models for gene expression prediction in maize (Benjamin Levy, Zihao Xu, Liyang Zhao, Shuying Ni, Phoebe Wong, Ross Karl Kremling, Ross Altman, and Chris Tanner). Preparing for Nature Genetics Submission.

For my Capstone course, students partnered with Inari to predict gene expression. They produced great results, so we’re extending this work for a publication. [Blog Overview] [Slides] [Poster] [Poster Video]

Toward a Revamped Real Estate Index (Will Fried, Jessica Wijaya, Shucheng Yan, Yixuan Di, Zona Kostic, Andy Terrel, and Chris Tanner). Aiming for IEEE Transactions on Knowledge and Data Engineering.

For my Capstone course, students partnered with REX Real Estate to predict future housing market conditions. They produced great results, so we’re extending this work for a publication. [Blog Overview] [Slides] [Poster] [Poster Video]

My dissertation concerned entity and event coreference resolution. I was fortunate to be Dr. Eugene Charniak’s final PhD student.

CURRENT STUDENTS

  • Anita Mahinpei (Master’s Thesis)

  • Xiaohan Yang (Master’s Thesis)

  • Xin Zeng (Master’s Thesis)

  • Jack Scudder (Master’s Thesis)

  • Mingyue Wei (Master’s Thesis)

  • Xavier Evans (Harvard ‘23)

  • Sun Jie (Master’s in Health Data Science)

  • Alessandro Stolfo (ETH-Zurich Master’s Thesis, co-advising with Mrinmaya Sachan)

  • Thomas Fouts (Brunswick School -> Michigan ‘24)

  • Ali Hindy (Brunswick School -> Stanford ‘24)

  • Ning Hua (Smith ‘21 -> Harvard ‘23)

PAST STUDENTS

INVITED TALKS

2021

2020

  • November 20 — Research Talk @ Florida Institute of Tech.

  • October 15 — Career Advice @ Florida Institute of Tech.

  • May 19 — Open Data Science Conference (ODSC)

  • January 23 — Sequential Data @ Harvard ComputeFest

2019

  • September 27 — PhD Alumni Panel @ Brown

  • October 27 — RDMeetsIT Panel @ MIT Media Lab + Mercedes Benz

  • March 11 — Coreference Resolution @ Invitae

  • April 1 — MIT

  • March 15 — University of Washington

  • March 6 — CMU

  • February 21 — Brown

  • February 15 — Harvard

EXPERIENCE

During my career within academia, industry, and the government, my work has concerned:

  • entity linking

  • coreference resolution

  • natural language understanding (NLU)

  • citation prediction

  • face recognition

  • topic modelling

  • machine translation

  • streaming algorithms for NLP

  • anomaly detection

  • adaptive web personalization

  • speech recognition via active learning

  • error-correcting codes

  • social network analysis

  • 2D pattern recognition

  • animats-based learning (swarm intelligence)