Are Automation and AI BS?

A couple weeks ago, I ended up taking Steli’s click bait and read his thoughts on sales automation and AI. There isn’t much novelty in the comments nor objections presented. However I felt compelled to write a answer. Part of the reason why, is that MadKudu is currently being incubated by Salesforce as part of the Einstein batch. Needless to say the word AI is uttered every day to a point of exhaustion.

The mythical AI (aka what AI is not today)

The main concern I have around AI is that people are being confused by all the PR and marketing thrown around major projects like Salesforce’s Einstein, IBM’s Watson and others – think Infosys Nia, Tata Ignio, the list goes on.

Two months ago, at the start of the incubator, we were given a truly inspiring demo of Salesforce’s new platform. The use-case presented was to help a solar panel vendor identify the right B2C leads to reach out to. A fairly vanilla lead scoring exercise. We watched in awe how the CRM was fed google street view images of houses based on the leads’ addresses before being processed through a “sophisticated” neural network to determine if the roof was slanted or not. Knowing if the roof was slanted was a key predictor of the amount of energy the panels could deliver. #DeepLearning

This reminded me of a use-case we discussed with Segment’s Guillaume Cabane. The growth-hack was to send addresses of VIP customers through Amazon’s mechanical turk to determine which houses had a pool in order to send a targeted catalogue about pool furniture. Brilliant! And now this can all be orchestrated within the comfort of our CRM. Holy Moly! as my cofounder Sam would say.

To infinity and beyond, right?

Well not really, the cold truth is this could have also been implemented in excel. Jonathan Serfaty, a former colleague of mine, for example wrote a play-by-play NFL prediction algorithm entirely in VBA. The hard part is not running a supervised model, it’s the numerous iterations to explore the unknowns of the problem to determine which data set to present the model.

The pragmatic AI (aka how to get value from AI)

Aside from the complexity of knowing how to configure your supervised model, there is a more fundamental question to always answer when considering AI. This foundational question is the purpose of the endeavor. What are you trying to accomplish with AI and/or automation? Amongst all of the imperfections in your business processes which one is the best candidate to address?

Looking through history to find patterns, it appears that the obvious candidates for automation/AI are high cost, low leverage tasks. This is a point Steli and I are in agreement on: “AI should not be used to increase efficiency”. Much ink has been spilled over the search for efficiency. Henry Ward’s eShares 101 is an overall amazing read and highly relevant. One of the topics that strongly resonated with me was the illustrated difference between optimizing for efficiency vs leverage.

With that in mind, here are some examples of tasks that are perfect fits for AI in Sales:

  • Researching and qualifying
  • Email response classification (interested, not interested, not now…)
  • Email sentiment classification
  • Email follow up (to an email that had some valuable content in the first place)
  • Intent prediction
  • Forecasting
  • Demo customization to the prospect
  • Sales call reviews

So Steli is right: No, a bot will not close a deal for you but it can tell you who to reach out to, how, why and when. This way you can use your time on tasks where you have the highest leverage: interacting with valuable prospects and helping them throughout the purchase cycle. While the recent advent of sales automation has led to an outcry against the weak/gimmicky personalization I strongly believe we are witnessing the early signs of AI being used to bring back the human aspect of selling.

Closing thoughts

AI, Big Data, Data Science, Machine Learning… have become ubiquitous in B2B. It is therefore our duty as professionals to educate ourself as to what is really going on. These domains are nascent and highly technical but we need to maintain an uncompromising focus on the business value any implementation could yield.

Want to learn more or discuss how AI can actually help your business? Feel free to contact us

Improve your behavioral lead scoring model with nuclear physics

According to various sources (SiriusDecision, SpearMarketing) about 66% of B2B marketers leverage behavioral lead scoring. Nowadays we rarely encounter a marketing platform that doesn’t offer at least point based scoring capabilities out of the box.

However, this report by Spear Marketing reveals that only 50% of those scores include an expiration scheme. A dire consequence is that once a lead has reached a certain engagement threshold, the score will not degrade. As put it in the report, “without some kind of score degradation method in place, lead scores can rise indefinitely, eventually rendering their value meaningless.” We’ve seen this at countless companies we’ve worked with. It is often a source of contention between Sales and Marketing.

So how do you go about improving your lead scores to ensure your MQLs get accepted and converted by Sales at a higher rate?

Phase 1: Standard Lead scoring

In the words of James Baldwin, “If you know whence you came, there are absolutely no limitations to where you can go”. So let’s take a quick look at how lead scoring has evolved over the past couple of years.

Almost a decade ago, Marketo revolutionized the marketing stack by giving marketers the option to build heuristical engagement models without writing a single line of code. Amazing! A marketer, no coding skills required, could configure and iterate over a function that scored an entire database of millions of leads based on specific events they performed.

Since the introduction of these scoring models, many execution platforms have risen. The scoring capability has long become a standard functionality according to Forester when shopping for marketing platforms.

This was certainly a good start. The scoring mechanism had however 2 major drawbacks over which much ink has been spilt:

  • The scores don’t automatically decrease over time
  • The scores are based on coefficients that were not determined statistically and thus cannot be considered predictive

Phase 2: Regression Modeling

The recent advent of the Enterprise Data Scientist, formerly known as the less hype Business Analyst, started a proliferation of lead scoring solutions. These products leverage machine learning techniques and AI to accommodate for the previous models inaccuracies. The general idea is to solve for:  

Y = ∑𝞫.X + 𝞮


Y is the representation of conversion
X are the occurrences of events
𝞫 are the predictive coefficients


So really the goal of lead scoring becomes finding the optimal 𝞫. There are many more or less sophisticated implementations of regression algorithms to solve for this, from linear regression to trees, to random forests to the infamous neural networks.

Mainstream marketing platforms like Hubspot are adding to their manual lead scoring some predictive capabilities.

The goal here has become helping marketers configure their scoring models programmatically. Don’t we all prefer to blame a predictive model rather than a human who hand-picked coefficients?!

While this approach is greatly superior, there are still a major challenge that need to be addressed:

  • Defining the impact of time on the scores

After how long does having “filled a form” become irrelevant for a lead? What is the “thermal inertia” of a lead, aka how quickly does a hot lead become cold?

Phase 3: Nuclear physics inspired time decay functions

I was on my way home some time ago, when it struck me that there was a valid analogy between Leads and Nuclear Physics. A subject in which my co-founder Paul holds a masters degree from Berkeley (true story). The analogy goes as follows:
Before the leads starts engaging (or being engaged by) the company, it is a stable atom. Each action performed by the lead (clicking on a CTA, filling a form, visiting a specific page) results in the lead gaining energy, thus furthering it from its stable point. The nucleus of an unstable atom will start emitting radiation to lose the gained energy. This process is called the nuclear decay and is quite well understood. The time taken to free the energy is defined through the half-life (λ) of the atom. We can now for each individual action compute the impact over time on leads and how long the effects last.

Putting all the pieces together we are now solving for:

Y = ∑𝞫.f(X).e(-t(X)/λ) + 𝞮


Y is still the representation of conversion
X are the events
f are the features functions extracted from X
t(X) is the number of days since the last occurrence of X
𝞫 are the predictive coefficients
λ are the “half-lives” of the events in days


This approach yields better results (~15% increase in recall) and accounts very well for leads being reactivated or going cold over time.

top graph: linear features, bottom graph: feature with exponential decay


Next time we’ll discuss how unlike Schrödinger’s cat, leads can’t be simultaneously good and bad…


xkcd Relativistic Baseball:
Marketo behavioral lead score:
Amplitude correlation analysis:
HubSpot behavioral lead score:
MadKudu: lead score training sample results

Startup vs bigco: the best career option in data science

Our fearless leader and CEO Sam Levan recently spoke at Galvanize in San Francisco about data science careers. A common student question is, “is a job at a startup or a big company better for data scientists”?

At MadKudu we’re in a good position to answer it – the 5 of us have more than 25 years experience in data science. We’ve built everything from the world’s largest fraud detection system to quick hacks in Google sheets.

So what is the best career option for an aspiring data scientist? Google or

Before answering I’m going to share a secret about being a data scientist. Whether you work at a 2-person startup or CapitalOne, there is one attribute which best predicts your probability of success.

The super-duper-secret to being a great data scientist is …

… wait for it …
…… wait for it ……
……… WAIT FOR IT!!! ………

Data science is a SOCIAL skill

That’s it. That’s the big secret. Your career as a data scientist will be defined by how well you can communicate, write, listen, organize, lead, and empathize.

Do you need hard skills? Of course. You can’t do the job if you don’t know how to use R, Python, or MatLab. You have to know how to measure the statistical significance of your results.

But unless you fancy yourself the next Will Hunting, being a brilliant Python coder won’t make you any more effective than a good one if you can’t work well with others.

Data scientists don’t work alone

Data scientists try to solve business problems with data – an iterative activity which requires working with cross-functional teams.

Suppose you’re a knowledge engineer working at a bank. On any given day you will need to:

  • Talk to regulators about the riskiest type of criminal activity.
  • Help analysts understand customer data and what behavior the bank can track.
  • Ask (beg?) the operations to pull new data sources for you.
  • Testify in court.

The most effective data scientists are team players who make everyone else more effective.

If you want to be a lone hero data science isn’t for you.

Communication beats code

Every morning I read Nate Silver’s analysis on FiveThirtyEight. Is Nate the world’s greatest statistician?

Of course not. Nate Silver’s brilliance is his ability to help us understand how data answers important questions in politics, sports and life.

We try to deliver the same value in our work.

This is – by far – our most popular blog post. Why was it featured on Growth Hackers, Hacker News, and Growth Hacking digest? It wasn’t a great study – just 9 sample companies. We didn’t build any amazing models – everything was done in Google Sheets.

As an example of data science it is … meh. Our customers loved it because we helped them understand what the data means and how they can use it to solve business problems.

Any idiot can have an opinion. Lots of smart people can compile numbers.

Few people can help others understand why data matters and what they should do – be one of them and you’ve got the world at your feet.

Data science careers: Startups vs Bigco

Back to the students’ questions:

“What is the difference between being a data scientist at a startup vs a Bigco?”

Specialist vs generalist

The biggest difference between being a data scientist at bigco vs a startup is your degree of specialization. At a bigco you have the opportunity to work for months and years on the same problem.

Are you excited about spending 2 years creating the world’s greatest recommendation engine for Facebook? Do you like doing primary research? Becoming an expert in building models to solve 1 problem? Becoming a master in R, Python, or MatLab?

That’s life in a bigco. I know Knowledge Engineers who spend a career detecting violations of stock market wash sale rules.

Startups? Ha ha ha!

You won’t know what you’re working on next week, much less next year. You’ll be spending your time helping marketing, sales, and product teams answer basic questions. Since you’re constrained by data and time you’ll do much of your work in spreadsheets or SQL.

It isn’t uncommon for a data scientist at a startup to be juggling 5 different problems at the same time. Your expertise will be your ability to quickly acquire and apply new skills – fortunately this is a great skill to have.

Support system

Unless you work at MadKudu, you may be the only data scientist at your startup. Your colleagues may not understand what you do or how you can help them. You may have to define your own objectives. On your first day you might be told to “go help the sales team find the best leads”. Does this terrify or excite you?

At bigco you will have a support system. Your boss will tell you which project you’re working on. More experienced data scientists can answer your questions. Have a problem? Ask your boss – that’s what she’s for.

Getting dirty

Data science textbook examples are fairy tales. In 20 years I’ve never encountered such simple problems. In the real world:

  • Simply getting the data is HARD.
  • People don’t agree on what columns actually mean.
  • Everything changes while you’re doing analysis.

Bigcos have teams of people to help solve these problems: server-side developers to populate the data warehouse and business analysts who write data dictionaries.

At startups … well … it is probably up to you. The developers are all too busy finishing the next release and supporting customers to run SQL queries. You have look in the code to see how the product generates the account_activated event in Mixpanel.


At bigco you’ll have a nice salary, 401(k) plan, and benefits. You’ll work a little harder the 3 months before bonus time so you can get that new car. It feels safe – but is it?

Life at a startup is the opposite. Part of your compensation will potential, unknown upside from stock options. Will you have a job next quarter? It depends on whether the CEO can close the B round. It feels risky – but is it?

I’ve worked for the world’s biggest, most stable employer and at any-day-we’re-dead-lets-start-stealing-office-supplies startups. I’ve had friends lose $500K starting a company and others struggle for years to find a job after being laid off. Here is how I think about risk.

Working for startups is very risky in the short run but incredibly stable in the long run.

The stability of bigcos comes at a price – you develop fewer skills, build fewer relationships, and don’t get regular experience marketing yourself.

This risk is particularly true for a data scientist who can get stuck working on the same problem … with the same tools… and the same people … for years. A major industry downturn can be economically devastating when all companies in a sector are laying off employees.

Both bigco and startup careers have risks – you just need to understand the risks you’re taking and be smart about managing them.

What’s best for you – bigco or a startup?

After reading this post you’re probably more confused than ever – because there is no one answer.

My #1 piece of advice is to go out an interview with big and large companies. Meet the teams and ask lots of questions.

What is the #1 problem you would be solving? Why? Who else is on your team? What do they say about the problem? What tools would you be using?

It’s the only way you’re going to see what is best for you.

Best of all it will give you an opportunity to work on those social and communication skills – the most critical ones for your career in data science.

Photo credit:

Use predictive analytics to reduce churn by 20% in 2 days – with 3rd-grade math

Most SaaS companies have 3 misconceptions about churn:

  1. They don’t realize how much churn is costing them.
  2. They think they know why customers churn.
  3. They think predicting churn with data is too hard.

If you’re not using predictive analytics to prevent churn this hack will help reduce your churn by about 20%. It takes about 2 days of work over a few weeks and you can do it in Microsoft Excel.

We used similar techniques to help Codeship retain 72% of their at-risk users.


Download the spreadsheet to follow the example below.

You need to predict churn with data

Your customers cancel for lots of different reasons. Projects get scrapped. Users get stuck and bail. The key user takes a sabbatical to breed champion goldfish.

Quite often you can intervene before this happens and prevent it – but the primary predictors of churn are not always obvious.

For instance many SaaS marketers assume last_login_at > 30 days ago predicts churn. We almost always identify better predictors such as changing patterns in user behavior.

Let me re-phrase this point a little stronger:

If you’re not looking at data to predict churn you are almost definitely missing the fastest, easiest way to increase your MRR.

Why this hack is effective

You don’t need a data scientist. Or developer time.

As long as you have access to metrics in Mixpanel, Intercom, etc. even junior members of your marketing team can do it.

Credit card companies invest massively in predicting churn because slight improvements generate millions of dollars. You’re not Capital One – you’re a SaaS company. You don’t need know what “entropy” is to start predicting churn.

You don’t need need statistics

Can you add? This the only math skill you need. There is one equation but we’ve already put it into the spreadsheet for you.

If addition is too complex consider outsourcing to a 3rd-grader. They’ll work for peanuts (or at least cookies).

The results are immediately actionable

We’re going to start with the data you already have in your analytics or marketing automation platform – so you can use the results to send churn-prevention emails or generate alerts for your sales team.

Step-by-Step: find the best predictors of customer churn

Download the spreadsheet

Click here to download.

The examples are easier to understand if you spend a few minutes looking at the spreadsheet. I break down each step below.

PR Power! – our example company

I’m going to walk you through each step using examples from a fictitious SaaS startup called PR Power! we introduced in a previous post.

PR Power! helps media managers in mid-sized businesses do better PR by generating targeted media lists. Customers pay $50-$5,000/month after a free trial. Marketing Mark, the CMO, is charged with reducing monthly churn from 5% to 4%.

Step 1 – Identify predictors of churn

Try to identify predictable reasons why customers cancel.

Mark’s team spent a few hours looking at the last 20 customers who canceled and identified a few predictors. He also interviewed the sales and customer success teams about these customers.

They came up with the following events that are likely to predict why a customer cancels an account with PR Power!

Champion departs – Usually PR manager leaves the customer’s company.

Project canceled – Customer signed up for a specific PR campaign and then decides not to run the campaign.

No journalists – Customer can’t find a good journalist in PR Power! to cover a story.

Support fails – Customer contacts support a few times and the problem isn’t solved – usually indicated by support tickets open a long time.

Stale list – Customer’s media list is less useful because journalists no longer available or active.

Step 2 – Translate the churn predictors to data rules – or eliminate them

Mark’s team took these qualitative events and tried to identify existing data in Mixpanel that might predict them. 3 were straightforward 2 took a bit of investigating.

No journalists required identifying customers who had searched for journalists but didn’t add them to the media list.

Support fails was simply too hard – the support desk data on tickets isn’t in Mixpanel so they decided to skip it.

Step 3 – Count the occurrences of each predictor

Mark put the predictors at the top of his spreadsheet and identified every customer who matched a data rule yesterday.

For instance, User 80374 last_login_at > 30 days ago is TRUE so he entered a 1 for Project canceled.

Step 4 – Track every customer who churns until you hit 100

Mark adds a “Canceled?” column to the spreadsheet. Each day he identifies every customer who cancels until 100 customers cancel. This takes 2 ½ weeks.

Step 5 – Count the matching events for each predictor

Now for the 3rd-grad math …

For each predictor, count every customer where the churn predictor is TRUE and the customer canceled.


Mark starts with the Project canceled rule and counts the following

Number of times last_login > 30 days ago is TRUE and YES, the customer canceled.

For instance, customer 80374 and 89766 fit this criteria. He counts 22 instances.

Step 6 – Enter the results into the spreadsheet

Enter the total in the appropriate block of the 3×3 matrix to calculate the Prediction Score (This is implementation of the Phi coefficient).

Mark enters 22 and calculates Prediction Score for Project canceled at 0.009

Step 7 – Identify the biggest predictors of churn

Rules with the higher Prediction Score are better predictors of churn.

Mark compares the Prediction Score for each rule and sees an obvious pattern.


Two observations immediately jump out at Mark:

First, last_login_at > 30 days ago doesn’t tell him much about Project canceled. Since PR Power! has long-term customers who use the product periodically this isn’t surprising.

Second, No journalists is the clear winner. In hindsight, this makes sense – customers who try to find a journalist and can’t are getting no value from the product.

Step 8 – Take steps to prevent churn

Mark creates 2 rules in Mixpanel for the No journalists predictor.

Small accounts

When a customer has total_searches > 5 within last 30 days AND media_list_updated_at > 30 days ago Mark creates an auto-message inviting a customer to watch a webinar on “How to search for a journalist”.

Large Accounts

When a customer has total_searches > 5 within last 30 days AND media_list_updated_at > 30 days ago Mark creates an alert for the sales team to notify them about a customer at risk for churning.

An easier way – ask us to do this for you

You don’t need even need 3rd grade math.

Just take a free trial of MadKudu and let us run these calculations for you.

Cancel anytime if you don’t like it – keep whatever you learn and all the money you make from reducing your churn.


Want to learn more? Sign up for our new course.


Photo credit: Rodger Evans