About

Hello, my name is Kyle Cox. Currently I am a master’s student in computer science at the University of Texas at Austin. While pursuing my master’s, I am a research associate at the IC2 Institute, where I also collaborate with the AI Health Lab. Previously I was an undergraduate at UT Austin where I completed my bachelor’s in mathematics. I was also a Forty Acres Scholarship Finalist and National Merit Scholar. I’m interested in machine learning, language models, AI safety, and basketball analytics.

Work, Projects, Experience

Some things I have worked on:

  • ARENA
    • From January to February 2024, I am participating in ARENA (alignment research engineer accelerator). ML fundamentals, Mechanistic Interpretability, and Reinforcement Learning.
  • IC2 Institute
    • Collaborated with MIT Lab for computational physiology to build a classifier for poor health literacy from patient clinical notes
    • Using All of Us research program/dataset for risk assessment for health outcomes, in particular analyzing the relationship between social determinants of health and clinical outcomes
    • With Craig Watkins, pitched an app protocol to do behavioral health coaching with ecological momentary assessment to Texas Health Catalyst at Dell Medical School. Our pitch was selected as a Phase 2 Finalist with about an 8% acceptance rate.
  • AI Health Lab
    • Experimenting with serialization approaches for language models to do inference on health time series data
    • Showed language models are few-shot decision tree generators
    • Testing tree of thoughts prompting approach for gene set summarization with large language models
  • Melange
    • Interned for Melange in Spring 2022, leading strategy initiatives for a real-money prediction markets platform
  • Directed Reading Program
    • DRP is a research training group for undergraduates in mathematics
    • I participated for three sessions–Spring 2021, Summer 2021, and Fall 2021–under the mentorship of Hunter Vallejos
    • Spring 2021: studied Markov chains and the mixing times of random card shuffles
    • Summer 2021 (funded): studied ergodic dynamics, with an application to Benford’s Law
      • Built a simple web dashboard where you can upload a dataset and evaluate if your data is Benford
      • Also did a Benford analysis on Detroit precinct voting data amid claims of voter fraud. See here
    • Fall 2021: did a survey of topics in machine learning, following Kevin Murphy’s Machine Learning, a Probabilistic Perspective
  • UT/Goldsberry Basketball Analytics
    • Worked on computer vision and basketball analytics problems with Kirk Goldsberry, one of which turned into the capstone paper for my applied statistics certificate
    • Did lots of cleaning and modeling of ball-tracking computer vision data, working with data from the UT men’s basketball program, Toronto Raptors, and University of Toronto biomechanics & sports medicine lab
  • Dalton Pathology
    • Did independent research with Dr. Leslie Dalton over summer 2021, doing deep learning for breast cancer tissue classification
  • Alegion
    • Interned for Alegion over summer 2020, where I expanded annotation tools for the data labeling platform
  • Tutoring/Teaching
    • Tutored students in math through all of undegrad, everything from middle school to college level, but most often pre-calculus and calculus. 100+ clients
    • Also taught a biweekly summer math course at my old high school for advanced middle school students, and spoke to the math club a couple of times. This was a ton of fun.
  • NBA/College Basketball Projects
    • My NBA rankings and predictions can be found here. College ratings I update every couple of weeks here.
      • Initially was just an application of PageRank to college basketball rankings.
      • Now I focus on the NBA side, which uses a more involved efficiency-based algorithm and Monte Carlo simulations to build season projections.
    • I also do live in-game win probabilities for NBA. Sometimes when there are a bunch of games on, I put the win-probability dashboard up on my monitor while I’m working.
    • PBP data
      • A bottleneck to NBA analytics hobbyists is good play-by-play data. Though there are many play-by-play data sources, it is harder to find play-by-play data with on-court lineups. This is important, because to do individual player evaluation, you have to know who was on the court at every moment. However, for a’bout 90% of games, you can remedy this by iterating through each play and back-filling lineups where there is a mention of a player in action. I did this for games from 1999-2022 and host the dataset on Kaggle so that hobbyists can try at building their own player evaluation metrics.