• Tacotron: End-to-end Speech Synthesis

    2016 - present tags: google machine-learning sound

    General architecture of Tacotron.

    The most exciting work I’ve been involved with on the Sound Understanding team has been the development of Tacotron, an end-to-end speech synthesis system that produces speech “end-to-end” from characters to waveform. The initial system took verbalized characters as input and produced a log-magnitude mel spectrogram, which we then synthesize to waveform via standard signal processing methods (Griffin-Lim and an inverse Short-time Fourier Transform). In Tacotron 2, we replaced this hand-designed synthesis method with a neural vocoder, initially based on WaveNet.

    One line of research on my team is direct-to-waveform acoustic models, skipping the intermediate spectrogram representation. In the Wave-Tacotron paper, we published a model that does just that.

    Check out our publications page for the full list of research that my team has co-authored with the Google Brain, Deepmind, and Google Speech teams.

  • Google Sound Understanding

    2015 - present tags: google sound machine-learning research

    In 2015, I joined the Sound Understanding team within Google Perception. We focus on building systems that can both analyze and synthesize sound. Being able to work on my hobby (sound and digital signal processing) as my full time job has been a dream come true. We operate as a hybrid research team, which means we both publish our work and deploy it to improve Alphabet’s products and services.

    I’ve had the opportunity to work on some neat tasks and projects during my time on the team, but speech synthesis has been what I’ve spent the most time working on.

  • Google TensorFlow

    2015 - present tags: google machine-learning

    TensorFlow's OG logo.

    In 2015, I joined the Sound Understanding team within Google Perception. Our main tool for machine learning research is TensorFlow. Over the years I’ve contributed a number of features to TensorFlow that have been crucial to the research work my team has done.

    The highlights include:

    • Added the tf.signal module of signal processing components.
    • Extended real and complex FFT support, implemented with Eigen (CPU), cuFFT (GPU), and TPU support.
    • Significantly expanded complex number support.
    • Bugfixes and contributions to various parts of the runtime, libraries and more.

    Check out the full list of commits on GitHub.

  • Google Inbox

    2013 - 2015 tags: google

    As part of my work on the Google Knowledge Graph, I was super happy to play a small role in the creation of Inbox, Google’s fresh take on email. I worked closely with the Inbox team to bring some beautiful imagery and Knowledge Graph smarts to Inbox’s smart email annotations. The beautiful photos you see in Inbox are partly the result of my work.

    Google Inbox

    In June 2015, we launched Trip bundles in Inbox, showing a summary of your upcoming and previous trips.

    Google Inbox Trips
    Google Inbox Trips
  • Google Knowledge Graph

    2012 - 2015 tags: google

    Another public Google project that I have played a significant part in is the Google Knowledge Graph.

    Google Knowledge Graph

    There are many benefits to indexing human knowledge in structured form (as opposed to indexing strings for keyword search). The features we are adding to search that make use of this data are reason enough to develop the Knowledge Graph because they are very useful to searchers today. However, I think we are only at 1 percent of what is possible with having a structured knowledge repository and that is the exciting part to me.

  • Google Symptom Search

    2011 tags: google health

    Since 2011 I have been working at an obscure web startup called Google. I’ve been working on various search projects, some of which have seen the light of day.

    One project I am particularly proud of is called symptom search. A lot of people use Google to search for their symptoms. Sometimes, it may even be time-critical or life threatening (such as a query for [chest pain]).

    To help users better understand the medical conditions associated with their search terms, symptom search shows you a list of conditions most relevant to your search terms when you seem to be searching for information about medical symptoms.

    Google Symptom Search

    We launched this is February 2012 and it has been helping many users find information about their symptoms.

  • Sana Telemedicine Platform

    2009 tags: health open-source

    Sana Telemedicine Platform

    Sana is open-source telemedicine software for Android. It allows field workers to treat or triage patients using simple workflows encoded on Android devices to gather data and send them to doctors (typically a partner hospital) over mobile networks. Doctors can review the uploaded data (pictures, text, audio, video, gps, ECG, pulse oximetry, etc.) and make recommendations and requests for more data.

    I was primarily responsible for the client and server side components of Sana, as well as integration of Sana with OpenMRS — a freely available, electronic medical record system. I also integrated Sana with a bluetooth enabled electrocardiogram device for Sana’s cardiovascular health pilot in India.

    I started the project with Zack Anderson in 2009 (originally called Moca) and it has taken on a life of its own since then. Implementation, deployment, and research using the Sana platform is now driven by the Clinical Decision Making Group at MIT CSAIL. There is a Sana class taught at MIT: HST.936: Global Health Informatics to Improve Quality of Care.

  • Mixxx DJ Software

    2008 - present tags: sound open-source music

    Mixxx is Free and open source DJ software for Windows, Mac and Linux that gives you everything you need to perform live mixes.

    Mixxx Logo

    I got involved with this lovely open source project through Google Summer of Code in 2008. Since then I’ve contributed thousands of hours and quite a bit of code.

    In 2011, I took over as the project lead developer. Together with a handful of core developers and hundreds of contributors we work together to produce the best free DJ software available. You can check out our code on GitHub.

    As the lead developer I am involved with nearly every major change to Mixxx. Some of the major projects I’ve been heavily involved with over the years include:

    • OpenGL waveforms (my GSoC project)
    • SQLite-based music library
    • arbitrary numbers of decks (previously Mixxx was hard-coded to 2)
    • looping
    • hotcues
    • waveform scratching
    • sample decks
    • worker-thread based architecture for decoding audio
    • key detection and pitch shifting
    • dynamic / resizable UI (rewrite of the original skin system)
    • modular effects system (combined plugin-based and native effects)
    • non-constant beatgrids (allows mixing tracks that change tempo)
    • master sync (persistent syncing of decks)
    • concurrent library scanner
    • internationalization / translation support
    • tools for making evidence-based changes (performance metrics)
    • unit testing

    When I joined the project you couldn’t run Mixxx for more than a few minutes without encountering crashes. That’s what happens when you have thousands of lines of C++ written by 3 distinct teams of people over the course of a decade with close to no documentation!

    My biggest contribution to Mixxx has been restructuring the codebase to prevent common problems that lead to segfaults (i.e. reduction of mutable shared state across threads, separation of realtime callback code from the rest of Mixxx, modularity and good conservative code practices). Stability is a feature!

    I enjoy working on Mixxx because it combines my love of electronic music, software engineering and product design.

  • MBTA Security Analysis

    2008

    In 2008, I worked with three other MIT students (Sam McVeety, Zack Anderson, and Alessandro Chiesa) to analyze the operational and cryptographic security of the Massachusetts Bay Transportation Authority’s farecard media: the CharlieTicket (magnetic strip) and CharlieCard (RFID card). We discovered significant issues with the magnetic card media, the RFID fare cards, and the physical security of the system.

    Our work became something of a spectacle when the MBTA decided to sue us before we presented it (sans many crucial details needed to replicate our work) at the DEFCON security conference. We received a temporary restraining order — unfortunately the slides to our talk had already been distributed to conference attendees. To add to the comedy of errors, the MBTA filed a confidential document we had provided to them containing the crucial details we had withheld from our talk as evidence in our lawsuit, thus making it public.

    The Electronic Frontier Foundation and American Civil Liberties Union came to our side to defend us in our right to publish academic research. The temporary restraining order and the lawsuit were both dropped.

    See also: Zack’s writeup and cryptome.

  • Blue-Steel: Playstation 3 Raytracer

    2007

    During IAP of 2007, I took 6.189, a multicore programming course. The point of the class was to explore applications of the Playstation 3’s Cell processor to multicore tasks. I learned a lot about the specifics of the Power architecture, and got very cosy with the PS3’s 8 slave processors.

    My project for the class was to write a parallelized realtime raytracer. In 23 days, my team put together a very decent raytracer with reflection, refraction, generated material texturing using Perlin noise, and enhanced material shading with BRDF and traditional shading models. Running our demo with 3 ray bounces on a moderately complex scene achieved over 20 frames per second with 6 active cores, along with linear speedup properties.

    We won the competition, and Sony invited us to be their guest speakers to the 2007 Game Developers Conference in San Francisco. We gave an hour long talk about our project and the uses of the PS3 in the academic community.

    Project website with source code.

  • La Fontaine du Campus Est

    2007 - 2008 tags: sound music

    During IAP 2007, Rob Gens, Zack Anderson, and I worked building our very own Bellagio Fountain at MIT as part of the East Campus annual Bad Ideas competition. We weren’t able to finish it in time, and the project sat around for a year until the 2008 Bad Ideas competition, where we finally finished it up.

    It was a lot of fun and hard work to build, and we ended up getting help from other putzen. Check out the videos and get the source code here: La Fontaine.

  • MIDAS: Multi-Function In-Dorm Automation System

    2005

    My freshman year at MIT Zack Anderson, my roommate and I built an automation system for our dorm. It controlled our lights, blinds, music, and even ran our security system, which consisted of a number of internal and external security cameras. The whole thing was controllable from a secure remote interface.

    A full writeup of the project is here.

  • SCRYPTO Protocol

    2005

    The SCRYPTO (Steganographic CRYPTOgraphy) Protocol is a toy protocol I designed for secure communication on computer networks. It makes use of covert channels in the TCP protocol to steganographically transfer encrypted information.

    Steganography is the art of information hiding, while cryptography is more like information scrambling. Covert channels such as slack space in network data structures are prime candidates for covert transfer of data.

    I was in high-school when I wrote this. Hilariously, I implemented 3-DES and Diffie-Hellman key exchange by hand (which is a terrible idea). For some reason I was very intent on doing a “clean room” implementation using only the standard (for academic integrity :P) so I did it without looking at any other implementations. I verified it against the against the test-cases provided in the FIPS specification.

    I don’t suggest anyone actually use this for anything! (these days you shouldn’t be using 3-DES anyway). With that said, feel free to check out the paper and code.

  • The Basement

    2002 - 2007 tags: music

    I’m an electronic music fanatic, and I am fixated on most anything that glows. Around 2002 a group of my best friends and I started painting my parents’ basement with UV reactive paint. 5 years later, the room was completely covered with designs and murals. A sound system, comfy couches, and DJ equipment make this room a relaxing hangout for me and my friends.

    The website (another artifact of early-aughts web design — Java applet and all) has pictures, videos, and more information.

  • AtriusMUD

    1998 - 2002

    Around 1998 I started playing MUDs. MUDs are precursors to MMORPGs — they are multiplayer text adventure games. After putting in over a month and a half of playtime on a MUD I frequented, I became interested in running my own MUD. I started one as a derivative of CircleMUD.

    I wrote the code, while a group of friends and I built the stories and the worlds of the MUD. AtriusMUD was started around 1999, and then shut down in 2001. In 2002 I started developing it again. It has been offline for quite a while now.