Biography

My name is Tommy. I’m a member of the technical staff at In-Q-Tel and a coordinator for Data Science DC. I have an MS in mathematics and statistics from Georgetown University and a BA in economics from the College of William and Mary. I am also a PhD student in the George Mason University Department of Computational and Data Sciences. I am the author of the textmineR package for the R language. I am also a Marine Corps veteran. My opinions are terrible and they are all my own.

Interests

  • Statistics / Data Science
  • Machine Learning
  • Distributional Semantics
  • R Programming
  • Mixology
  • Old Things

Education

  • PhD in Computational Science & Informatics, (ongoing)

    George Mason University

  • MS in Mathematics and Statistics, 2012

    Georgetown University

  • BA in Economics, 2009

    The College of William and Mary

Projects

marginal

R package for calculating marginal effects for arbitrary prediction models.

mvrsquared

R package implementing multivariate R-squared for topic models and other multivariate outcome models

R-squared for Topic Models

Working paper of a Coefficient of Determination for Topic Models.

textmineR

Text mining and topic modeling in R.

tidylda

R package for Latent Dirichlet Allocation using ‘tidyverse’ conventions plus some of my own special stuff

Recent & Upcoming Talks

Optimizing Topic Models for Classification Tasks

Using a Bayesian optimizer to build better topic models.

A Brief Introduction to Graph Theory

Actually just a 10-minute introduction to graph analytics. Very little math was involved.

Mining Texts with textmineR

An introduction to textmineR v3.x which introduced new features.

textmineR - NLP with R

The first time I spoke about textminR in public

Introduction to Topic Modeling with LDA and more

Topic models are a family of models to estimate the distribution of abstract concepts (topics) that make up a collection of documents. …

Recent Posts

Bayesian model evaluation and comparison

Suppose you have a Bayesian statistical model. How do you know it’s a good one? How do you know it isn’t fundamentally misspecified? …

Intro to clustering

Cluster analysis is a type (perhaps the most common type) of unsupervised machine learning. In cluster analysis, the goal is to assign …

Support vector machines and kernels

In my effort to blog my way through the rest of my PhD and study for comps, I present to you more on support vector machines. …

Support Vector Machines in Hastie et al.

In my effort to blog my way through the rest of my PhD and study for comps, I present to you support vector machines. This is the first …

Blogging my way through the rest of my PhD

I’ve reached a turning point in my PhD studies. Classes are behind me and what lies ahead is largely unstructured. Success or failure …

Experience

 
 
 
 
 

Member of the Technical Staff

In-Q-Tel

Apr 2017 – Present Arlington, VA
 
 
 
 
 

Director of Data Science

3E Services, LLC

Apr 2015 – Sep 2015 McLean, VA
 
 
 
 
 

Director of Data Science

Impact Research

Mar 2015 – Mar 2017 Columbia, MD
 
 
 
 
 

Statistician

Science and Technology Policy Institute

Jan 2013 – Apr 2015 Washington, DC
 
 
 
 
 

Data Scientist

Decision Q

Jun 2012 – Jan 2013 Washington, DC
 
 
 
 
 

Research Assistant

Federal Reserve Board

Jun 2009 – Aug 2011 Washington, DC
 
 
 
 
 

Research Assistant

Global Research Institute | William and Mary

Jan 2008 – May 2009 Williamsburg, VA
 
 
 
 
 

Rifleman

United States Marine Corps

Aug 2000 – Aug 2004

Contact