Data-Driven Product Development

Data Driven Products Now!

Here’s a video of this talk. In this blog post, I will be using some screenshots from the above presentation.

In this presentation deck, former Etsy developer Dan McKinley outlines two very important ideas:

  1. Data-driven strategies of project management
  2. Data-driven tactics of product management

First, the Agile-esque process of iterative development using prototyping and A/B testing at key milestones is an interesting approach. This is difficult to do in both small and large companies due to different reasons. In small companies, the resources required for the discipline of this kind of development is too big. At large companies, a single developer or product manager would not have enough control to apply this process in a realistic way, especially given constraints from other teams, QA, designers, and management.

Screenshot 2015-12-17 00.11.13.png


This is a great ideal to shoot for. But personally, I don’t know how plausible it is. It requires buy-in from everyone involved. McKinley even briefly mentions this when he talks about discussions with his designer, in which he promises to polish up the prototype in the second phase (“Refinement”) of development.

The second great concept in this talk is the tactical use of simple arithmetic and statistics to make decisions about products and features. While the previous idea’s concern is the quality of a particular product in development, this idea pertains to the nitty-gritty of picking which products to develop in the first place.

Screenshot 2015-12-17 00.15.25.png

This back-of-the-envelope calculation, he outlines, has saved him from spending weeks or months in design, development, testing, deployment, and analysis. This idea seems fundamental and possibly obvious to experienced product managers. However, it is tempting to bypass the rigor this level of analysis requires when everyone involved is swept up in the excitement of a cool new feature.

While these two ideas a vital enough in themselves, this presentation gives us this diagram as icing on the cake:

Screenshot 2015-12-17 00.21.54.png

These charts demonstrate the change in priorities that must happen as a company grows. There are two reasons for this necessity:

  1. Risk mitigation – reducing the number of moonshot ideas being implemented
  2. The absolute importance of using data to make product decisions

And, as McKinley mentions at the beginning of the talk, a dangerous pitfall is to mis-categorize these three by forming an opinion and then finding some pieces of data to back it up. The approach should have more scientific rigor: it should start with a dispassionate hypothesis based on prior data, and through the process of prototyping and A/B testing, the hypothesis should either stand and be refined, or easily discarded after a falsification.

Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset

In my previous post, Differential Privacy: The Basics, I provided an introduction to differential privacy by exploring its definition and discussing its relevance in the broader context of public data release. In this post, I shall demonstrate how easily privacy can be breached and then counter this by showing how differential privacy can protect against this attack. I will also present a few other examples of differentially private queries.

The Data

There has been a lot of online comment recently about a dataset released by the New York City Taxi and Limousine Commission. It contains details about every taxi ride (yellow cabs) in New York in 2013, including the pickup and drop off times, locations, fare and tip amounts, as well as anonymized (hashed) versions of the taxi’s license and medallion numbers. It was obtained via a FOIL (Freedom of Information Law) request earlier this year and has been making waves in the…

View original post 2,314 more words