Home Artificial Intelligence Learn how to Construct a Data Science Strategy for Any Team Size Create a culture and practice that’s fast paced and resilient to alter What’s a technique? What will we mean by data science, and the way does this strategy concept apply? Constructing our strategic core hypothesis: Start with an AI-winning mindset Define your goal portfolio: Determine risk levels and management in step with your strategy Focus your portfolio on what your team is uniquely positioned to unravel Construct processes around your knowledge factory tooling and data supply chain Architecture and organization: Structure your org for sustained success Step back, communicate and holistically iterate References

Learn how to Construct a Data Science Strategy for Any Team Size Create a culture and practice that’s fast paced and resilient to alter What’s a technique? What will we mean by data science, and the way does this strategy concept apply? Constructing our strategic core hypothesis: Start with an AI-winning mindset Define your goal portfolio: Determine risk levels and management in step with your strategy Focus your portfolio on what your team is uniquely positioned to unravel Construct processes around your knowledge factory tooling and data supply chain Architecture and organization: Structure your org for sustained success Step back, communicate and holistically iterate References

0
Learn how to Construct a Data Science Strategy for Any Team Size
Create a culture and practice that’s fast paced and resilient to alter
What’s a technique?
What will we mean by data science, and the way does this strategy concept apply?
Constructing our strategic core hypothesis: Start with an AI-winning mindset
Define your goal portfolio: Determine risk levels and management in step with your strategy
Focus your portfolio on what your team is uniquely positioned to unravel
Construct processes around your knowledge factory tooling and data supply chain
Architecture and organization: Structure your org for sustained success
Step back, communicate and holistically iterate
References

Create a culture and practice that’s fast paced and resilient to alter

Towards Data Science
A chess board and pieces, with a leather sofa behind them.
Photo by Maarten van den Heuvel on Unsplash

For those who’re a knowledge science leader who has been asked to “construct our data science strategy” with much freedom and little direction, this post will aid you out. We’ll cover:

  • What we mean by strategy: Is it only a plan? A roadmap? Something more, or less? On this section we’ll get specific and adopt a working definition of what we’re constructing once we construct a technique.
  • How does this idea apply to an information science team in a practical organizational context? Here we’ll examine how our concept of strategy applies to data science, and get specific on what our strategy applies to.
  • Learn how to actually writer that strategy.

Throughout, we’ll borrow heavily from strategy approaches to R&D, which shares key challenges with data science: The mission to innovate, and the increased uncertainty that comes with in search of discovery. Once we conclude, you’ll come away with one clear definition of strategy, and a helpful process for authoring one for a corporation of any size.

If, like myself, you lack a flowery MBA and have never taken a business strategy seminar, you may puzzle at what exactly someone wants once they ask you to develop a “data science strategy.” And you may not find initial searches very helpful. Classic, powerful frameworks just like the Three C’s model (customers, competitors, company) make perfect sense at the extent of an organization determining where it should compete. Apply it to a function or team, and you end up feeling you’re stretching the concepts greater than they’ll bear.

For those who’re really like me, it’ll send you down a reasonably deep rabbit hole of reading books like Lords of Strategy and The McKinsey Way. (Affiliate links.) The primary is a pleasant work of business history, and the second is a helpful collection of techniques pulled from the experience of successful consultants at the distinguished firm. Neither offers a fast answer to the query. One very useful unintended effects of reading Lords of strategy, is learning the info scientists here aren’t alone: “[I]t’s easy to conflate strategy with strategic planning, but it surely’s also dangerous. […] still today, there are lots of more firms which have a plan than there are which have a technique. Scratch most plans, and also you’ll find some version of, ‘We’re going to maintain doing what we’ve been doing, but next yr, we’re going to do more and/or higher.” This confusion of definitions has shown up in my experience, where several times an ask for a technique boiled right down to, “What’s your plan for the subsequent few months?”

One very helpful definition of strategy, and the one we’ll adopt through the remainder of this text, is because of this working paper on R&D strategy by Gary Pisano: “A technique is nothing greater than a commitment to a pattern of behavior intended to assist win a contest.” The great thing about this definition is that it may apply across any and all levels and purposes of a corporation. All teams, of every kind and sizes, contribute to the organization’s competitive efforts, and all teams can define and declare the patterns of behavior they use to focus those efforts.

A technique is nothing greater than a commitment to a pattern of behavior intended to assist win a contest.”

—Gary Pisano

Pisano offers three requirements of strategy: Consistency, coherence and alignment. A technique should help us make consistent decisions that contribute, cumulatively, toward a desired objective; should aid all corners of a corporation in cohering their far-flung tactical decisions; and will align local actions with a bigger collective effort.

And at last, they’re all founded on core hypotheses, bets about what’s going to provide advantage in a contest. Pisano’s helpful example is that of Apple, whose strategy “to develop easy-to-use, aesthetically-pleasing products that integrate seamlessly with a broader system of devices in the buyer’s digital world” rests on a core hypothesis “that customers will likely be willing to pay a significantly higher price for products with these attributes.”

In essence, under this definition all strategies are bets that package the logic of decision-making: They offer all parties a way to find out which actions aid a collective effort.

We are going to adopt this definition of strategy, and strive to define of our own core strategic hypothesis on how data science will add value to our organization, and the patterns we’ll commit to within the pursuit of that value. Further, we’ll assume that our parent organization has a developed strategy of its own, and this input will likely be crucial once we apply the third test of alignment. Having defined the shape our final strategy should take, we’ll now turn our attention to bounding its scope.

To remind my friends how much fun I’m, I sent several the identical text message, “What do you’re thinking that of while you hear ‘data science strategy’?“ The answers ranged from very thoughtful points on data infrastructure and MLOps, past healthy bristling on the vagueness of the query (I feel seen), to the colourful, “Nonsense,” and “My ideal job.”

Small sample, but the various array of responses from this group — which included experienced product managers at each start ups and huge firms, a knowledge science lead, and a consultant — speaks to how muddled definitions of this term can get. Worse, data scientists suffer from a second prong of confusion: What’s billed as “data science,“ in practice, often follows from whatever skill set a firm desires to recruit for and gussies up with a title that’s in vogue.

To repair one in all these degrees of freedom in our evaluation, we’ll first adopt a standard definition of information science for the remainder of this text: The function dedicated to creating value and competitive advantage from modeling a corporation’s available data. That may take just a few typical forms:

  • Constructing machine learning models that optimize customer-facing decisions in production
  • Constructing models that aid staff in any respect levels in completing their work, perhaps in customer-facing human-in-the-loop applications
  • Constructing interpretable models for inferences that aid business decision making

Note that we’re excluding BI and analytics, and solely for the sake of focus and never because they’re less invaluable than modeling work. Your analytics shop and also you data science shop needs to be working together easily. (I’ve written about this here.)

Some, like my friend and Google PM Carol Skordas Walport, would suggest that data science strategy includes “Learn how to get the info and infrastructure in enough state to do evaluation or machine learning. I’d say it’s how do you enable the team to get all of the work done.” We’ll purposefully exclude these things of broader data strategy from scope. (Sorry, Carol.) We are going to, though, discuss navigating data and infrastructure limitations, and the way developing your data science strategy can positively guide your broader data strategy.

Now we’ve got bounds: We’re constructing a set of core strategic hypotheses on how machine learning and/or AI can add maximum value to a corporation, with its own defined strategy or objectives, and a set of patterns a team will commit to within the pursuit of that value. How will we start?

Experienced machine learning product managers, engineers and data scientists will often remark that machine learning products are different from traditional software. A corporation has to account for risk of model errors, data drift, model monitoring and refitting — hence the emergence of recent MLOps. And it’s fabulously easy to commit sins of engineering that wade ML applications into swamps of technical debt. (See “Machine Learning: The High Interest Credit Card of Technical Debt” for an awesome read on this topic.) So with all this cost, why will we do it?

Ultimately, we consider AI solutions because sophisticated models have a demonstrated track record of having the ability to detect invaluable patterns. These might be anything from clusters of customer preference that imply novel segmentations, to the latent representations that a neural network finds to optimize predictions. Any given machine learning construct relies on a case, or expectation, that a model can detect patterns that may improve a process, uncover actionable findings, or improve invaluable predictions.

In defining the core strategic hypothesis for a knowledge science team of any size, we will start with this McKinsey example description of how AI-enabled firms think in another way. From “Winning with AI is a mind-set”:

If we decide the correct use cases and do them the correct way, we’ll learn increasingly more about our customers and their needs and repeatedly improve how we serve them.

That is an enormously helpful lens in the trouble to construct a knowledge science strategy: It focuses us on maximum learning, and all we’ve got to do is land on our organization’s definition of “right.” But what are the “right” use cases for us?

Here Pisano is useful again, defining 4 elements of an R&D strategy that carry nicely to data science:

  • Architecture: The organizational (centralized, distributed) and geographic structure of our data science function.
  • Processes: The formalities and informalities of managing our work.
  • People: All the things from what mixture of skills we seek to draw and our worth proposition to our talent.
  • Portfolio: How we allocate resources across project types, and “the factors used to sort, prioritize and choose projects.”

We’ll start with the last concept, and switch our focus to defining the best portfolio of projects for our organization, the combo that we will persuade ourselves will drive essentially the most value. Given the nice variation across organizations, we’ll start with one challenge every organization faces: Risk.

Modeling work has uncertain outcomes. “ML can do higher” is a argument we regularly make based on history and intuition, and it often seems to be true. But we never understand how well it is going to work at first, until we prove by construction how well ML can solve an issue. Learning the reply to this query for any given use case can have variable levels of effort, and thus various levels of cost. The uncertainty as to this answer may also vary, based on how widely about our models have been applied and the way well we understand our data.

A friend and healthcare analytics product leader, John Menard, defined risk as an explicit part of information science strategy, “How are you maintaining a pipeline of small and bigger bets, while maintaining healthy expectations that that’s all they’re? What’s your strategy for killing a project when the info doesn’t pan out, or pivoting the deliverable should it not meet requirements?”

It’s sensible for organizations to be principled and specific concerning the level of resourcing they’ll afford, and for the way long. Listed below are just a few useful inquiries to ask of any individual modeling effort:

  • Estimated likelihood of success: What are the chances this model use case will pan out?
  • Expected range of returns: If successful, will this project deliver a tiny improvement in a process that may produce huge returns at scale? Will a breakthrough differentiate you from competitors?
  • Expected time to find failure: How long will it take to learn whether a project’s hypothesized value prop will materialize? What’s the minimum amount of resources you’ll be able to spend before learning this project won’t work out?

Hopefully, these principles are straightforward, and all are consensus good things. The perfect project is prone to pan out, with huge returns on investment, and if it fails, fails early. This heavenly triumvirate never materializes. The art is in making tradeoffs that suit your organization.

An early stage startup focused on disrupting a selected domain with AI could have investors, leadership and staff that accept the corporate as a single large bet on a selected approach. Or, it could prefer small projects that get to production fast and permit for fast pivots. Conversely, if we’re in a big, established company and well-regulated industry with ML-skeptics for stakeholders, we would decide to bias our portfolio toward low-LOE projects that deliver incremental value and fail fast. This may also help construct initial trust, tune stakeholders to the uncertainty inherent in DS projects, and align teams around more ambitious projects. Successful early small projects may also bolster the case for larger ones around the identical problem space.

Listed below are just a few examples of the right way to define your goal portfolio by way of project scope, duration, and expected returns:

  • “Being early in our collective data science journey, we’re focused on small, low-LOE and fast failure uses cases that can uncover opportunities without risking large amounts of staff time.”
  • “We’ve identified a portfolio of three large machine learning bets, each of which could unlock tremendous value.”
  • “We aim for a balance of small-, medium- and high-effort projects, with corresponding levels of return. This lets us deliver frequent victories while pursuing game-changing potential disruption.”

As a final principle to use in our complete portfolio, aim for a set of projects with non-correlated successes. Meaning, we wish to see our portfolio and sense that projects will succeed or fail independently. If multiple projects rest on a standard assumption, if we sense that they’re so closely related that they’ll succeed or fail together, then we must always revisit selection.

We’re done with this stage when we’ve got:

  • Surveyed our data science and machine learning opportunities
  • Plotted them by investment, return and likelihood of success
  • Chosen a rough cut priority list that’s consistent with our objectives and risk tolerance

Now that we’ve settled on our goal portfolio, we’ll turn to making sure that our processes position us to discover, scope and deliver invaluable projects fast.

The query of whether to construct or buy is perennial, and sometimes wades into complicated organizational dynamics. There’s no shortage of vendors and startups trying to deliver AI solutions. Many are snake oil; many work. Many internal tech and DS teams view the previous as a joke, the latter as competitors, and the time spent separating the 2 to be an enormous waste of time. This has merit, since time spent trying out a vendor doesn’t advance a modeler’s skills, and if a corporation doesn’t reward their effort, it’s a price the info scientist pays without profession reward. And this interpersonal complication compounds an already complicated business case: None of the standard software solution concerns go away. You continue to must worry about things like vendor lock-in and cloud integrations. Nevertheless, we must always all be willing to purchase vendor products that deliver higher ROI, and you’ll be able to cut through distractions for those who consider your internal team’s unique benefits over boxed solutions.

Particularly, your internal team can, usually, have governed access to far more of (perhaps all of) your organization’s proprietary data. Which means an internal team can probably understand it in additional depth, and enrich it with other sources more easily, than could a single-purpose vendor solution. Given enough time and compute resources, a capable in-house team can probably beat a single-purpose vendor solution. (There’s a PAC theory joke in here somewhere.) But is it price it?

Standard ROI and alternatives evaluation here is essential, with a give attention to your time to internal market. Say we’re optimizing ad placements on an e-commerce site. We’ve winnowed an inventory of vendors right down to one front-runner that uses a multi-armed bandit, a standard method amongst leading marketing optimization vendors at time of this writing. We estimate the time to vendor integration at one month. Or, we could construct our own MAB, and estimate that to take six. Would we expect that a MAB we construct will outperform the one under the seller’s hood, and sufficiently so to justify the delay?

Depends. Using Thompson sampling for a MAB buys you logarithmic bounds on expected regret, a jargon bomb which means it explores options without leaving much value on the table. That statement stays provably true no matter whether its implemented by your in-house team or a vendor. Conversely, your in-house team is closer to your data, and taking a use case like this in-house amounts to a bet that you just’ll find wealthy enough signals in that data to beat a vendor product. And maybe that your team can inject domain knowledge that an off-the-shelf solution doesn’t have, providing a invaluable edge. Finally, consider your in-house team’s opportunity cost: Is there one other high-value item they might work on as a substitute? If that’s the case, one option is to check the seller, work on the opposite item, and reassess after you’ve gotten measurable vendor results.

We’re done with this stage when we’ve got:

  • Reviewed our opportunities from the prior step and, for every, answered, “Could we buy this?”
  • For every purchasable solution, answered whether we’ve got a novel known or hypothetical advantage in-house
  • For every space with real trade-offs to be made, performed a trade-off evaluation

Having defined our internal teams strategic competitive benefits, we’ll now account for our internal processes, tooling and data capabilities.

I’ve discussed the subject of time-on-task with loads of experienced data scientists, and each one cites the invention, processing, cleansing, and movement (to an acceptable compute environment) of information as the majority of their time spent on the job. As one other group of McKinsey authors write on AutoML and AI talent strategy, “Many organizations have found that 60 to 80 percent of a knowledge scientist’s time is spent preparing the info for modeling. Once the initial model is built, only a fraction of his or her time — 4 percent, in accordance with some analyses — is spent on testing and tuning code.” This isn’t what draws most of us into the sport. In most of our minds it’s the fee we pay for the enjoyment of constructing models with impact. Because of this, we regularly talk concerning the “foundations” that data scientists require to achieve success. In my experience, this framing can quickly get in our way, and I’m going to challenge us to think about ourselves as a model factory, subject to constraints of tooling and an elaborate, often problematic, data supply chain.

Confession: I’ve never bought into these “foundation” talking points when platforms are under discussion.

“Data and ML platforms are the foundations successful machine learning rest on,” goes a bolded statement in countless slide decks and white papers. “And with no strong foundation,” some consultant concludes, paternalistically, “all the pieces falls apart.”

Here’s the rub, though: Only a few things “collapse” without machine learning. Start your home on a nasty foundation and your garage might collapse on itself, and also you. Start a machine learning project without the advantage of developed data and ML platforms, and your model construct will…take longer. And without that fancy latest machine learning model, chances are high your online business will persist in the identical way it has, albeit without some competitive advantage that ML aimed to deliver. But persisting in mediocrity isn’t doomsday.

That’s where this cliche loses me. It seeks to scare executives into funding platform efforts — invaluable ones, it’s price stressing — as if the world will end without them, and it is going to not. We scream that the sky is falling, after which when a stakeholder encounters the usual rain they’re used to, we lose credibility.

Nevertheless, I’d wager that firms with strong ML capabilities will outperform competitors that don’t — it’s not lost on me that my profession as a modeling lead is precisely such a bet — and modern data and MLOps capabilities can greatly reduce AI capabilities’ time to market. Consider this excerpt from the McKinsey paper “Scaling AI like a tech native: The CEO’s role,” emphasis mine:

We often hear from executives that moving AI solutions from idea to implementation takes nine months to greater than a yr, making it difficult to maintain up with changing market dynamics. Even after years of investment, leaders often tell us that their organizations aren’t moving any faster. In contrast, firms applying MLOps can go from idea to a live solution in only two to 12 weeks without increasing head count or technical debt, reducing time to value and freeing teams to scale AI faster.

Your data science strategy must account to your organizational and tooling constraints, and adopt patterns that produce models or units of data which can be actionable inside those constraints. That’s, modeling projects should at all times have:

  1. A transparent line of sight to minimum-viable modeling data. Your data science team should know where the source data is, and have a rough sketch of the way it’ll have to be transformed.
  2. An easy and realistic path to realized value. How will you get a sufficiently performant model live, or otherwise apply model results?

Early-stage firms or teams with full, greenfield freedom over architecture and tooling are well-positioned to adopt a contemporary MLOps practice, which can make it easier to quickly prototype, deploy and monitor models to gauge their impact in the actual world. Teams working alongside or inside longstanding legacy tech might find that it wasn’t built with ML integration in mind, and that deployment is a big, heavyweight exercise. Firms in tightly regulated industries will likely find that many applications require high levels of explainability and risk control.

None of those challenges are insurmountable. We just must be principled and savvy about timeline implications, and construct this into our decision-making.

We’re finished with this stage when we’ve got:

  • Surveyed our planned use cases to find out the trail to data for every to start
  • Determined each use case’s path to realized value if it were to succeed
  • Factored this into our expected investment and adjusted it from the 1st step
  • Refined our prioritization in light of any changes we’ve discovered

Having refined our ideas our ideas of where to deploy data science, we’ll consider working models to make sure alignment.

Pisano defines architecture as “the set of choices around how R&D is structured each organizationally and geographically.” Designing this includes mindful decisions about the right way to integrate our data scientists with a business unit. Are they fully centralized with a proper intake? Reporting to varied business units? Centralized and embedded? Reporting structures and decision-making authorities will not be under your control, particularly for those who’ve been tasked with constructing a technique for a unit with defined reporting lines. But when these points are under discussion, here just a few things to contemplate in maximizing the worth DS outputs.

Will your data scientists be well-supported and appropriately measured? Consider the pipeline of junior data science talent. Data scientists join the sector from quite a lot of quantitative backgrounds, typically with a combination of theoretical and practical skills. A typical MS grad spent these childhood constructing skills and understanding, and demonstrating that understanding to experts of their field. This doesn’t generally include an abundance of coaching in communicating technical findings to non-experts.

Contrast this with the experience they’ll have in a business setting, where they’ll likely have less domain knowledge and be one the few with methods knowledge. They’ll be asked to use techniques that few outside their function understand. Their projects will necessarily include more uncertainty than standard software builds. Their success will hinge on many more aspects, many outside of the info scientist’s control, and they’ll have little or no experience articulating the necessities to maximise possibilities of success. Put all this together, and we begin to see a thrown-in-the-deep-end situation emerge.

This could result in challenges for other functional leaders during their first experience leading data science teams. This lesson from McKinsey’s “Constructing an R&D strategy for contemporary times” carries to our field as well:

Organizations are inclined to favor “secure” projects with near-term returns — similar to those emerging out of customer requests — that in lots of cases do little greater than maintain existing market share. One consumer-goods company, for instance, divided the R&D budget amongst its business units, whose leaders then used the cash to fulfill their short-term targets fairly than the corporate’s longer-term differentiation and growth objectives.

In our field, this tends to play out with junior data scientists being asked by their non-technical supervisors to jot down whatever SQL query will answer the query(s) of the day. This is normally helpful, but normally not the type of value an enterprise is trying to drive by recruiting savvy modelers.

This problem is far more easily solved when you’ve gotten leaders who’ve managed DS or ML projects before. No matter function, success hinges on having individuals who can hearken to an issue and scope analytical and modeling approaches to solving them, and manage the risks and ambiguity. Loads of early profession data scientists thrive in these situations. In my experience they’re outliers with gifts in each communication and coping with ambiguity. I’ve been lucky enough to rent just a few by accident — hi Zhiyu! Bank in your ability to screen for and these talents, and compete for them, at your peril.

All this may seemingly argue for centralizing your data science function. That’s one approach, and it brings us to our next vital query.

Will your data scientists be close enough to the business to give attention to the correct problems? A central data science functional group is prone to get less exposure to the business problems you’d like solved, in comparison with hyper-local teams that report on to a business team. Big, monolithic, functional teams with formal intakes can struggle to get the business input they need, largely because many stakeholders aren’t really sure what to ask for. For those who’ve heard a horror story or two about data science teams turning out “science projects no person asked for,” this is commonly a root cause. And again, resist the urge to stereotype: This isn’t because the info science team has too academic a mindset, and far more actually because two different functions don’t know the right way to converse in a shared language.

What options does this leave us? It’s one reason embedded models have worked in my experience. On this model, your data science team is obtainable access to the entire forums you routinely discuss business problems in. They’re liable for seizing this chance to know the issues a business team wants to unravel, and for proposing approaches that may add value. They report back to data science leaders, who ensure they’re doing methodologically sound work, support them in getting what their projects need for achievement, and mentor and coach their growth.

Sometimes data science projects fail due to shoddy methodology; they often fail because predictive features aren’t adequately helpful. Knowing the difference might be very difficult for somebody outside a quantitative function.

We’ve finished with this step when we’ve got:

  • Defined crisp ways of communicating scope of information scientists or teams
  • Defined engagement patterns

As in all practical decisions, there are trade-offs in every single place and no silver bullets to be found. Completely autonomous local teams will maximize give attention to different, local outcomes. A centralized function will minimize duplication with an increased risk of deviating from practical, impactful outcomes.

Let’s review what we’ve completed to date:

  1. Defined a strategic hypothesis, the big bet on how we’ll add value with data science and machine learning.
  2. Defined a goal portfolio that aligns with our organization’s risk appetite, accounts to your process and tech constraints, and focuses our team on the issues we will’t buy your way through.
  3. Filtered our use cases based on data access and the way they’ll drive value.
  4. Possibly, developed reporting structures project sourcing methods that support your data scientists and focus their talents on their unique benefits.

More plainly, we’ve laid out the factors for locating our right use cases, and filtered our use case opportunities to search out the primary right set.

The following things to do are:

  1. Step back and have a look at all the pieces together. Viewed as a holistic whole, is it sensible?
  2. Communicate this strategy, and the initial plan that emerged from it.
  3. Communicate how would-be stakeholders can engage your functional team.
  4. Iterate: Revisit your strategy every time assumptions or circumstances that led to it modified, and commit to a cadence for reviewing how circumstances have modified.
A chess board and pieces, with a leather sofa behind them.

To conclude, this process is a sobering amount of effort. But, it comes with the nice reward. This strategy will deliver a transparent articulation of the risks you should take, the way you’ll manage them, and the way they’ll support your goal outcomes in the event that they repay. A transparent alignment of purpose, and ease of keeping activities consistent with that purpose, is an incredibly empowering thing for a functional team. Deliver that, and results will follow.

LEAVE A REPLY

Please enter your comment!
Please enter your name here