Social media spam as a case study
Disclaimer: the examples on this post are for illustrative purposes and aren’t commentary on any specific content policy at any specific company. All views expressed in this text are mine and don’t reflect my employer.
Why is there any spam on social media? Nobody apart from the spammers themselves enjoys clickbait scams or phishing attempts. We’ve got many years of coaching data to feed machine learning classifiers. So why does spam on every major tech platform feel inevitable? In spite of everything these years, why do bot farms still exist?
The reply, in brief, is that it’s really hard to fight spam at scale, and exponentially harder to accomplish that without harming real users and advertisers. On this post, we’ll use precision and recall as a framework for understanding the spam problem. We’ll see that eradicating 100% of spam is impractical, and that there’s some “equilibrium” spam prevalence based on finance, regulations, and user sentiment.
Imagine we’re launching a competitor to TikTok and Instagram. (Forget that they’ve 1.1 billion and 2 billion monthly energetic users, respectively; we’re feeling ambitious!) Our key differentiator on this tight market is that we guarantee users may have only the very best quality of videos: absolutely no “get wealthy quick” schemes, blatant reposts of existing content, URLs that infect your computer with malware, etc.
Attempt 1: Human Review
To attain this quality guarantee, we’ve hired a staggering 1,000 reviewers to audit every upload before it’s allowed on the platform. Some things just need a human touch, we argue: video spam is just too complex and context-dependent to depend on automated logic. A video that urges users to click on a URL might be a malicious phishing attempt or a benign fundraiser for Alzheimer’s research, for instance — the stakes are too high to…