The most exciting work I’ve been involved with on the Sound Understanding
team has been the development of
Tacotron, an end-to-end speech synthesis system that produces
speech “end-to-end” from characters to waveform. The initial system took
verbalized characters as input and produced a log-magnitude mel spectrogram,
which we then synthesize to waveform via standard signal processing methods
(Griffin-Lim and an inverse Short-time Fourier
Transform). In Tacotron 2, we replaced this hand-designed
synthesis method with a neural vocoder, initially based on WaveNet.
One line of research on my team is direct-to-waveform acoustic models, skipping
the intermediate spectrogram representation. In the
Wave-Tacotron paper, we published a model that does just that.
Check out our publications page for the full list of
research that my team has co-authored with the Google Brain, Deepmind, and
Google Speech teams.
In 2015, I joined the Sound Understanding team within Google
Perception. We focus on building systems that can both
analyze and synthesize sound. Being able to work on my hobby (sound and digital
signal processing) as my full time job has been a dream come true. We operate as
a hybrid research team, which means we both publish our work
and deploy it to improve Alphabet’s products and services.
I’ve had the opportunity to work on some neat tasks and projects during my time
on the team, but speech
synthesis has been what I’ve
spent the most time working on.
In 2015, I joined the Sound
Understanding team within Google
Perception. Our main tool for machine learning research is
TensorFlow. Over the years I’ve contributed a number of features
to TensorFlow that have been crucial to the research work my team has done.
The highlights include:
Added the tf.signal module of signal processing components.
Extended real and complex FFT support, implemented with Eigen (CPU), cuFFT
(GPU), and TPU support.
Significantly expanded complex number support.
Bugfixes and contributions to various parts of the runtime, libraries and more.
As part of my work on the Google Knowledge Graph, I was super happy to play a
small role in the creation of Inbox, Google’s fresh take on
email. I worked closely with the Inbox team to bring some beautiful imagery and
Knowledge Graph smarts to Inbox’s smart email annotations. The beautiful photos
you see in Inbox are partly the result of my work.
In June 2015, we launched Trip bundles in Inbox, showing a
summary of your upcoming and previous trips.
Another public Google project that I have played a significant part in is the
Google Knowledge Graph.
There are many benefits to indexing human knowledge in structured form (as
opposed to indexing strings for keyword search). The features we are adding to
search that make use of this data are reason enough to develop the Knowledge
Graph because they are very useful to searchers today. However, I think we are
only at 1 percent of what is possible with having a structured knowledge
repository and that is the exciting part to me.
Since 2011 I have been working at an obscure web startup called
Google. I’ve been working on various search projects, some of which
have seen the light of day.
One project I am particularly proud of is called symptom search. A lot of people
use Google to search for their symptoms. Sometimes, it may even be time-critical
or life threatening (such as a query for [chest pain]).
To help users better understand the medical conditions associated with their
search terms, symptom search shows you a list of conditions most relevant to
your search terms when you seem to be searching for information about medical symptoms.
We launched this is February 2012 and it has been
helping many users find information about their symptoms.
Sana is open-source telemedicine software for Android. It allows field
workers to treat or triage patients using simple workflows encoded on Android
devices to gather data and send them to doctors (typically a partner hospital)
over mobile networks. Doctors can review the uploaded data (pictures, text,
audio, video, gps, ECG, pulse oximetry, etc.) and make recommendations and
requests for more data.
I was primarily responsible for the client and server side components of Sana,
as well as integration of Sana with OpenMRS — a freely available,
electronic medical record system. I also integrated Sana with a bluetooth
enabled electrocardiogram device for Sana’s cardiovascular health pilot in India.
I started the project with Zack Anderson in 2009 (originally called
Moca) and it has taken on a life of its own since then. Implementation,
deployment, and research using the Sana platform is now driven by the Clinical
Decision Making Group at MITCSAIL. There is a Sana class taught at MIT:
HST.936: Global Health Informatics to Improve Quality of Care.
Mixxx is Free and open source DJ software for Windows, Mac and Linux
that gives you everything you need to perform live mixes.
I got involved with this lovely open source project through
Google Summer of Code in 2008. Since then I’ve contributed thousands
of hours and
quite a bit of code.
In 2011, I took over as the project lead developer. Together with a handful of
core developers and hundreds of contributors we work together to produce the
best free DJ software available. You can check out our code on
GitHub.
As the lead developer I am involved with nearly every major change to
Mixxx. Some of the major projects I’ve been heavily involved with over the years include:
OpenGL waveforms (my GSoC project)
SQLite-based music library
arbitrary numbers of decks (previously Mixxx was hard-coded to 2)
looping
hotcues
waveform scratching
sample decks
worker-thread based architecture for decoding audio
key detection and pitch shifting
dynamic / resizable UI (rewrite of the original skin system)
modular effects system (combined plugin-based and native effects)
non-constant beatgrids (allows mixing tracks that change tempo)
master sync (persistent syncing of decks)
concurrent library scanner
internationalization / translation support
tools for making evidence-based changes (performance metrics)
unit testing
When I joined the project you couldn’t run Mixxx for more than a few minutes
without encountering crashes. That’s what happens when you have thousands of
lines of C++ written by 3 distinct teams of people over the course of a decade
with close to no documentation!
My biggest contribution to Mixxx has been restructuring the codebase to prevent
common problems that lead to segfaults (i.e. reduction of mutable shared state
across threads, separation of realtime callback code from the rest of Mixxx,
modularity and good conservative code practices). Stability is a feature!
I enjoy working on Mixxx because it combines my love of electronic music,
software engineering and product design.
In 2008, I worked with three other MIT students (Sam McVeety,
Zack Anderson, and Alessandro Chiesa) to analyze the
operational and cryptographic security of the Massachusetts Bay Transportation
Authority’s farecard media: the CharlieTicket (magnetic strip) and CharlieCard
(RFID card). We discovered significant issues with the magnetic card media, the
RFID fare cards, and the physical security of the system.
Our work became something of a spectacle when the MBTA decided to sue us before
we presented it (sans many crucial details needed to replicate our work) at
the DEFCON security conference. We received a temporary restraining order —
unfortunately the slides to our talk had already been distributed to conference
attendees. To add to the comedy of errors, the MBTA filed a confidential
document we had provided to them containing the crucial details we had withheld
from our talk as evidence in our lawsuit, thus making it public.
During IAP of 2007, I took 6.189, a multicore programming course. The point of
the class was to explore applications of the Playstation 3’s Cell processor to
multicore tasks. I learned a lot about the specifics of the Power architecture,
and got very cosy with the PS3’s 8 slave processors.
My project for the class was to write a parallelized realtime raytracer. In
23 days, my team put together a very decent raytracer with reflection,
refraction, generated material texturing using Perlin noise, and enhanced
material shading with BRDF and traditional shading models. Running our demo with
3 ray bounces on a moderately complex scene achieved over 20 frames per second
with 6 active cores, along with linear speedup properties.
We won the competition, and Sony invited us to be their guest speakers to the
2007 Game Developers Conference in San Francisco. We gave an hour long talk
about our project and the uses of the PS3 in the academic community.
During IAP 2007, Rob Gens, Zack Anderson, and I worked building
our very own Bellagio Fountain at MIT as part of the East Campus annual Bad
Ideas competition. We weren’t able to finish it in time, and the project sat
around for a year until the 2008 Bad Ideas competition, where we finally
finished it up.
It was a lot of fun and hard work to build, and we ended up getting help from
other putzen. Check out the videos and get the source code here:
La Fontaine.
My freshman year at MITZack Anderson, my roommate and I built an
automation system for our dorm. It controlled our lights, blinds, music, and
even ran our security system, which consisted of a number of internal and
external security cameras. The whole thing was controllable from a secure remote interface.
The SCRYPTO (Steganographic CRYPTOgraphy) Protocol is a toy protocol I designed
for secure communication on computer networks. It makes use of covert channels
in the TCP protocol to steganographically transfer encrypted information.
Steganography is the art of information hiding, while cryptography is more like
information scrambling. Covert channels such as slack space in network data
structures are prime candidates for covert transfer of data.
I was in high-school when I wrote this. Hilariously, I implemented
3-DES and Diffie-Hellman key exchange by hand
(which is a terrible idea). For some reason I was very intent on doing a
“clean room” implementation using only the standard (for academic integrity
:P) so I did it without looking at any other
implementations. I verified it against the against the test-cases provided in
the FIPS specification.
I don’t suggest anyone actually use this for anything!
(these days you shouldn’t be using 3-DES anyway). With
that said, feel free to check out the paper and code.
I’m an electronic music fanatic, and I am fixated on most anything that
glows. Around 2002 a group of my best friends and I started painting my parents’
basement with UV reactive paint. 5 years later, the room was completely covered
with designs and murals. A sound system, comfy couches, and DJ equipment make
this room a relaxing hangout for me and my friends.
The website (another artifact of early-aughts web design — Java
applet and all) has pictures, videos, and more information.
Around 1998 I started playing MUDs. MUDs are precursors to MMORPGs — they are
multiplayer text adventure games. After putting in over a month and a half of
playtime on a MUD I frequented, I became interested in running my own MUD. I
started one as a derivative of CircleMUD.
I wrote the code, while a group of friends and I built the stories and the
worlds of the MUD. AtriusMUD was started around 1999, and then shut down
in 2001. In 2002 I started developing it again. It has been offline for quite a
while now.