"According to a recent study by x..."

Unfortunately, the "recent" study is often >5 years old.

I created this database for two reasons:

  1. A big part of my job in cybersecurity marketing is finding statistics to back up arguments that vendors make in their marketing material.
  2. I wanted to help my cybersecurity marketing agency's team understand trends in the market, e.g., the % increase of x malware variant, uptake of y technology, how SOCs are feeling alert fatigue, etc.

The problem: I couldn't find up-to-date statistics on search engines.

Publicly available data was (and is) being SEO'd to death. By that, I mean that the information vendors, regulators and others publish is (often) heavily distorted and misrepresented by third parties in order to rank unrelated content on Google and other search engines.

For example, when I google "cybersecurity statistics," the #1 result is a compilation of data that claims to be "recent."

However, on closer inspection:

  • None of the >100 statistics featured in the top Google result are recent, i.e., published within the current year or the one before. Most are much older and likely out of date.
  • There are no direct links to sources for any statistics - just a general list of potential sources at the end of the blog post, e.g., "IBM."
  • After checking one statistic randomly, I could not find its primary source anywhere, so I suspect it may have been misquoted or wrong. Likely, others were, too.

Bottom line: The #1 result is a list of statistics that are impossible to verify, likely out of date, and may have never been correct in the first place.

The same is true for more specific cybersecurity statistic compilations, e.g., "statistics about endpoint security" or "insider threat detection rates 2024."

Lots of stats in articles that pretend to be relevant to this year or topic often turn out to be several years old or reference a source that is another round-up article that then references another one or a totally broken link.

We also find that wrong or misleading statistics create a compounding problem.

High-ranking blog A misquotes a statistic as being from the wrong year —> Article B references blog A in relation to a tangental topic —> Blog C references article A.

And so on. Bad data the whole way down.

AI search is only making the situation worse.

AI-powered search engines like ChatGPT Search and Perplexity take SEO data like this as inputs and use it to derive further data. The results you get from these search engines are basically Bing search remixed.

To help fix this problem in one industry (cybersecurity), our small team has spent hundreds of hours reading through publicly available data from vendors and regulators, including press releases, blog posts, and news articles from the last 12 months, found the original source for as many statistics as possible, sorted them by category, and added links back to the OG sources for everything. 

We have created an internal directory of almost 5,000 statistics (at the time of writing) which we use for the benefit of our clients as a cybersecurity marketing agency.  

But now, to help people like me and grow a newsletter I want to write, I am giving a more limited version of our collated statistics directory away for free.

We don't own any of these data points but present them with the original source to give their publishers full credit (something that is often lacking from third-party articles).

Sign up for our cybersecurity statistics newsletter, “CyberSecStats,” and you will get:

a) Monthly stats with original sources in your inbox.
b) Free access to a directory with >1000 (actually recent) cybersecurity statistics.