How B2B SaaS companies get their data into the Wall Street Journal

How B2B SaaS companies like Ramp and Carta turn proprietary data into press citations, without building a full research department.

Clock icon
8
min read

In the last year, Ramp's Business Spending Report data has shown up in the Wall Street Journal, the New York Times, the Financial Times, on CNBC, and Inc. One ongoing dataset, five citations in top-tier publications.

That kind of pickup, previously the purview of institutions like the Fed, the BLS, or a handful of think tanks, is now showing up under the byline of a payments company.

But Ramp isn't a total outlier in its success. Carta is another great example: their State of Private Markets is the data venture reporters reach for when fundraising comes up. SparkToro's zero-click search numbers get cited by the Economist, the Wall Street Journal, and the Washington Post. Ahrefs publishes a study on how AI Overviews cite sources, and within a week the study itself becomes a citation.

A small group of B2B companies has figured out how to become the source that other people quote, including, increasingly, LLMs. The Campfire Labs team set out to figure out how they did it.

Why proprietary data became an unfair advantage

Among all the disruption that we've seen in B2B content in the last few years, three shifts are important to understanding the apparent revival in data-driven content.

Search behavior changed first. SparkToro's research with Datos found that for every thousand Google searches in the US, only around 360 result in clicks to the open web. The rest stay inside Google's surface, get answered by an AI Overview, or end the session entirely. Ranking for a question and capturing the click is a smaller prize than it was three years ago.

Second, the citation economy expanded. LLMs need primary-source data to answer questions, and they cite it back when they do. Ahrefs's recent studies on which sites get cited in AI Overviews and inside ChatGPT have become reference points the same way SparkToro's zero-click numbers did. The citation economy now has a new participant: the LLM that decides what to surface in answer mode.

Third, the journalism cycle got faster. Quarterly economic data, monthly indices, weekly tracking… reporters covering markets, hiring, fundraising, and consumer behavior need numbers that are days or weeks old. Government data is slower than the news cycle now, which means that companies sitting on real-time transaction, hiring, or behavioral data have something the BLS doesn't: speed.

Ramp's Business Spending Reports draw on anonymized transaction data from more than 50,000 US businesses. Their Ramp Rate and AI Index datasets now live inside the Bloomberg Terminal and are queryable through Claude and ChatGPT. The reason the WSJ, the NYT, the FT, CNBC, and the Federal Reserve all cite them is obvious: when an economics reporter needs a number on how small businesses are spending this month, the most recent source is Ramp.

We'd argue that that's the asset Ramp is actually building: a real-time economic indicator with a company logo on it. That's a world away from creating a one-off ebook based on survey data.

Four moves the companies getting cited share

They publish quarterly, not annually. Ramp, Carta, and SparkToro (in partnership with Datos) all run their flagship data on a quarterly cycle. Carta's State of Private Markets ships every quarter and pulls from live cap table and fundraising data; for example, Q3 2025 showed startups raised $27.3 billion, the highest quarterly sum in three years. The point here is that by the time an annual report comes out, the number it reports is twelve months stale. Whereas a quarterly cadence gives reporters something to write about every ninety days, and it gives the dataset a rolling life as a tracking series rather than a one-shot release.

The headline finding reframes the category. Stripe's annual letter reported that businesses on Stripe processed $1.9 trillion in 2025, roughly 1.6% of global GDP, with startup cohorts on the platform growing 50% faster year-over-year than the prior cohort. The number reframes how to think about the size of the internet economy. SparkToro's 1,000-to-360 ratio reframed how marketers argue about Google. Across these companies, the headline number reframes a conversation the reader is already having.

The methodology is narrow. Ahrefs's recent data studies aren't sprawling "state of the industry" reports. They're tight pieces designed around questions their readership wants answered: how AI Overviews cite sources, what schema markup actually does for AI visibility, how ChatGPT decides which sources to surface. Each one answers a single sharply-worded question with a defensible method.

The data ships in more than one format. Ramp's data is a blog post, yes, but it's also a press-pitched report, a Bloomberg Terminal feed, and a queryable surface inside LLMs. The same dataset operates as a PR asset for journalists, a market signal for finance professionals, and a citation source for generative AI.

"But we're not Ramp…"

"We don't have transaction data from fifty thousand businesses. Our dataset is small. This advice doesn't apply to us."

We hear this often, and the first part of it is accurate. The number of B2B companies sitting on Ramp's kind of financial firehose is small.

But SparkToro is also a small team (a team of B2B legends, but a small team nonetheless). The zero-click search work cited by the Economist, the Wall Street Journal, and the Washington Post came from partnering with Datos for the panel data and then doing the analytical work of identifying the number worth talking about. The underlying dataset isn't proprietary to SparkToro, but the framing and the analysis are.

The advantage comes from editorial judgment, not from dataset scale. Pick the right question, define a tight methodology, find the number that reframes a conversation. We would argue that any SaaS content team can achieve the level of editorial judgement that Ramp has, with the right kind of thinking.

Most B2B SaaS companies have publishable data they haven't audited for press potential, such as:

  • Product usage benchmarks: what does the median customer actually do with the tool?
  • Customer cohort behavior: how do retention or expansion patterns shift across segments?
  • Internal benchmarks: what does "good" look like across your user base?
  • Survey rights to your own customers.
  • Partner data nobody's analyzed editorially.
  • Pricing or contract data competitors don't have visibility into.

Not all of it is publishable, of course: some is too sensitive, some too thin. But "we don't have data" can sometimes be more accurately framed as "we haven't yet audited what we have."

How to start without building a research program

The instinct after reading this is to budget for a big annual research report. That's a bigger swing than most companies should take on a first attempt. An annual program ties up months of work before you know whether the concept holds, and gets one press cycle a year.

Start smaller. Pick one question your data can answer that no one outside your company can. Not "the state of [your category]": that's a sprawling report, and it'll take a year. We advise the SaaS companies we work with to pick one question, one headline number, and one ninety-day plan from kickoff to publication.

A few examples:

  • A payroll platform could publish "median salary for engineering hires by company stage, this quarter." Specific. Repeatable. News-grade.
  • A customer support tool could publish "percentage of support tickets resolved without a human in the loop, by segment." That's a number an AI-desk reporter would write about today.
  • A recruiting platform could publish "time-to-hire for senior engineering roles, by city, by quarter." That number doesn't currently exist outside private datasets.

Each concept is small enough that one analyst plus one writer can ship a first version inside a quarter.

Before you commit, run the publication through two tests.

  • Would a journalist who covers your category quote this number? If not, the finding isn't news. Either find a sharper cut of the data, or pick a different question.
  • Would an LLM cite this when someone asks the underlying question? Search ChatGPT and Claude for the question your number answers. If nothing exists yet, you're potentially creating the first canonical citation.

If neither test passes, the question isn't the right one yet. Ramp, Carta et al went through a similar process at the start of their data-driven content programs; they started with one number nobody else could publish, ran it regularly, and let the dataset earn citations over time. Fancy add-ons like Ramp's Bloomberg Terminal placement and the LLM integrations came after several quarters of data had built credibility.

The first move for any B2B SaaS company that wants to be in the same position is the same one. Sit down this week with whoever owns your data—analytics, product, customer ops—and ask one question:

What's a number our data shows nobody outside this company can see right now?

Write down what comes up. Some of the data will be thin or boring, and some of it will be unpublishable for privacy or competitive reasons. But some of it will be a real insight that nobody else in your category has access to.

Take the strongest candidate to your next content meeting and ask whether it could be a quarterly publication. If the answer is yes, you've started on the path to the Wall Street Journal. If the answer is no, ask the same question again next quarter. The data you collect between now and then might change the answer.

Cassie is the CEO of Campfire Labs

Want more stories like this?

Subscribe to our newsletters for insights, ideas, and perspectives from the brightest minds in marketing, delivered straight to your inbox.

Thanks for signing up! You'll hear from us soon.
Oops! Something went wrong.