Categories
Data Science Leadership

Build a Data Science Team from Scratch

Is your organization beginning it’s analytics or AI journey? One of the first steps will be building a data science team to help develop and deploy machine learning models. I was lucky to have the opportunity to be the first data scientist at a tech startup. In this article, I will share how we built a data science team from scratch.

Leadership Buy-in

The importance of executive sponsorship for your data science efforts cannot be overstated.

In my current role (not the tech startup), predictive modeling is core to our company’s value proposition. As a result, we have a large data science team relative to the size of the company and access to great compute resources.

But at the tech startup, predictive modeling was viewed by many as a nice-to-have. I made business cases to product managers for why they should add predictive components to their products. Most were interested, but early on none could find time on the roadmap given other priorities.

As the company and it’s product offerings expanded, eventually new senior leaders joined and key existing leaders began to believe data science was part of the critical path. From there, we finally got models in production and grew the team.

If your organization doesn’t have an executive sponsor, be persistent in building your business case but expect an uphill battle early on.

Infrastructure Prerequisites

Before data scientists can be productive, you need 1) data and 2) data engineers that can make it accessible. (Read this if you’re not sure about the difference between a data engineer and a data scientist.)

When I joined as the first data scientist, data was locked in a transactional database with prohibitively slow query times. An engineering team was in the middle of building a Spark-based data platform, but it was designed for customer reporting, not data science. So to get access to the data and do my work, I sat next to the data architect and learned Scala.

It wasn’t until the business intelligence team built a data warehouse in Snowflake when life as a data scientist got easy. None of my models made it to production until this data warehouse was built. Something worth noting here: the business intelligence team doubled in size to build and maintain the data warehouse.

Org Structure

There’s no easy answer to the question of where in your organization a data science team should live. Expect to go through some trial and error with your data science org structure in the early days. I personally had 7 different bosses in 2.5 years and went through 3 re-orgs.

Nonetheless, here are some popular org structures for data science.

  • In parallel with software engineers
    • If your data science team is building models for production, customer-facing systems, it probably should report into the CTO.
    • Data science is similar to DevOps in that it is separate from but in support of software engineering.
    • The languages and tools for data science are different from those of software engineering, so there needs to be some separation. But not too much, because software engineering will consume model outputs. Coordination will need to be tight.
  • Center of excellence
    • It’s common for data science teams to be at least partially centralized, with most team members reporting into a head of data science.
    • It’s important to have at least some level of centralization so data scientists can share knowledge with one another, do peer reviews, and collaborate during the creative stages of model development.
  • Embedded with product teams
    • Early on, I sat with a product team mixed with engineers, designers, and product managers. This approach failed for me because I was forced to use software engineering tools to deploy models, rather than the data science tools I was comfortable with. I decided to leave that team and join the business intelligence team to get more of the center-of-excellence benefits described above.
  • Internal modeling
    • If your data science team is building models for internal users (more of a consulting, business intelligence-type function but with machine learning), then it might not make sense for the data science team to report into engineering. But as the team scales and use cases expand into customer-facing products, engineering becomes a more natural place to live.

Team Composition

While there’s no prescription for optimum data science team composition, you should be aware of the many backgrounds and specialties within data science (and the many similar roles often confused with data science).

  • Researcher
    • If your product or solution approaches the known boundaries of machine learning, you may want to stack your team with PhD’s.
    • Examples of unsolved problems that require deep research:
  • Engineer
    • Most predictive modeling use cases do not exist on the boundaries of knowledge. They involve applying known algorithms to new contexts. Chances are, your use case falls into this category.
    • Build a team with solid foundations in statistics and programming. They will move quickly to find statistically-sound solutions using known approaches, and be capable of deploying to production.
  • Business
    • Users and product managers can’t always anticipate what is possible when machine learning is applied to your data. Have enough team members capable of working closely with the product team to assist with ideation, feasibility studies, and rapid prototypes when needed.
    • Data visualization and business communication are also crucial. Ensure you have enough team members who can tell a compelling story with data.
  • Specialties
    • Depending on your machine learning task and the type of input data you have, you may need to hire someone who specializes in one of the following.
      • Natural Language Processing (NLP) – text data
      • Computer Vision – image or video data
      • Deep Learning – massive, unstructured data sets
      • Personalization – product recommendations
      • Reinforcement Learning – robotics, navigation, finance
      • Fraud and Anomaly Detection – finance, web security, IoT
      • Time Series – finance, event data

Recruiting

Data science has consistently ranked as one of the best jobs in the US for many years. There’s high demand for data scientists and a stubborn shortage driven by the difficulty of acquiring all the requisite skills.

In my experience building a team from 1 to 6, you should expect the recruiting process to take anywhere from 3-6 months for each team member. The more senior the role, the longer it will take to fill.

It’s also wise to build the team incrementally vs hiring all at once. Your first hire may tell you that the data infrastructure isn’t ready yet and that you’re better off hiring more data engineers.

Whatever you do, do not hire a “Researcher” type as your first data scientist. You probably want a “Business” type data scientist who can work closely with the product team, effectively communicate challenges and progress, and evangelize data science in your org. The first 2 data scientists on our team (myself and another) both had MBAs. That business sense made all the difference in the early days.

Be Patient

Data science is inherently experimental. Even once you assemble your dream team, it may take many months (6+) before they discover and deploy a solution to your business problem.

The timeline from zero (no buy-in, no data infrastructure, no data science org structure, no data science team) to production can be extremely long. Depending on the size of your org and the maturity of its relationship with data, it could be 1.5 to 2 years.

One trick to help your team deliver value sooner, and maintain buy-in from your organization, is to ask them to knock out some quick wins. That might mean deploying a baseline model, or solving a more narrowly-scoped version of the problem.

But at the end of the day, the road to building a data science team from scratch is long, windy, and expensive. This is why AI vendors are so prolific and valuable. The build vs buy decision for AI products is a tough one, but I hope this article gives you a bit more of the information you need to make it.

Jared Rand

By Jared Rand

Jared Rand is a data scientist specializing in natural language processing. He also has an MBA and is a serial entrepreneur. He is a Principal NLP Data Scientist at Everstream Analytics and founder of Skillenai. Connect with Jared on LinkedIn.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.