According to Huntress Labs, a Shodan search for "Confluence" returns more than 200,000 results, and searches for the Confluence favicon return more than 5,000. These figures aren't an indication of the number of vulnerable instances, but do show how many are exposed to the internet.
Key Takeaways
Determining the number of internet-facing hosts affected by a new vulnerability is a key factor in determining if it will become a widespread or emergent threat. Are there a lot of hosts affected? Pretty good possibility things are about to pop off. Only a few hosts? Probably less likely. But actually, counting those hosts has become quite a bit more challenging.
Take for example, CVE-2023-22527 affecting Atlassian Confluence. At the time of writing, Confluence has appeared on the CISA KEV list nine (yes, nine) times. That’s a level of exploitation that should encourage everyone to get their Confluence servers off the internet. But let’s look for ourselves. There are a number of generic Confluence Shodan queries floating around, but X-Confluence-Request-Time might be the most well known (this simply checks for an HTTP response header value):
241,000 hosts is a great target base for an emergent threat! But, on closer examination, there’s something off about the listed hosts. For example, this one has the Confluence “X-Confluence-Request-Time” header:
But it also has an F5 favicon, and it also claims to be a QNAP TS-128A. This is a honeypot. Whoever created this honeypot was somewhat clever. They mashed together the popular Shodan queries for Confluence, F5 devices, and QNAP systems, to create an abomination that would show up in all three queries.
To avoid throwing exploits all over the internet (and thus getting quickly caught), some attackers use Shodan (or similar) to curate target lists. This honeypot is optimized for this use case. Which is neat, but blocks our view of what is real. Can we filter them out of our search?
In a blog about CVE-2023-22527, Project Discovery provides this Shodan query in a Nuclei template:
http.component:"Atlassian Confluence"
The result is significantly better than the “X-Confluence-Request-Time” query, but you can still see the 2nd and 3rd results are honeypots. So that won’t do.
At this point, it’s probably useful to look at what a real Confluence server HTTP response looks like (this is actually after a 302 redirect, but let’s avoid that discussion):
HTTP/1.1 200
Cache-Control: no-store
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Confluence-Request-Time: 1696956993845
Set-Cookie: JSESSIONID=72D881CD92E61BE1394BB6231C28A68B; Path=/; HttpOnly
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Content-Security-Policy: frame-ancestors 'self'
X-Accel-Buffering: no
Content-Encoding: gzip
Vary: User-Agent
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Date: Tue, 10 Oct 2023 16:56:33 GMT
The server has a number of useful headers to key off of, but we’ll try to filter by adding in Set-Cookie: JSESSIONID=
. That update brings the host count down to nearly half of the Nuclei query.
But still, there are so many honeypots! Almost all of which aren’t responding with an actual Confluence landing page. A simple way we can capitalize on that is to include a snippet from the Confluence login page in our query: html:"confluence-base-url"
:
That does knock off ~17,000 hosts, and things are looking more Confluency. But there seems to be a whole bunch of entries without favicons. Let’s drill down into one and see…
It’s a honeypot. This one is really well done. It looks just like a standard Confluence install, except it produces 302 redirects on the .css
, .js
, and favicon
requests.
Unfortunately, Shodan doesn’t provide a good way to filter out hosts without favicon. Additionally, filtering on a known favicon is a non-starter because users can upload their own. So we have to find some other discrepancies in these honeypots in order to filter them out. Lucky for us, they have a few mistakes, but highlighted here is the most obvious:
They all use the exact same JSESSIONID. Filter all those out, and we have the following:
A quick investigation suggests that this could be the complete set of real Confluence hosts (or just very very good honeypots). That’s a reduction from 240,000 hosts all the way down to just 4,200. That means there are approximately 236,000 Confluence honeypots on the internet or more than 50 times the actual number of real Confluence servers.
Conclusion
A vulnerability that only impacts 4,000 hosts is much less concerning than a vulnerability that impacts 240,000. Understanding the scale of an issue is important, and therefore, being precise about the number of potentially impacted hosts is important too. Those who copy overinflated statistics or haven’t done their due diligence are making vulnerabilities appear more impactful than they truly are.
While we focused on Confluence, this particular problem has been repeated across many different targets. Honeypots are a net good for the security community. But their expanding popularity does make understanding real-world attack surfaces much more difficult for defenders, not just attackers.
About VulnCheck
VulnCheck continuously monitors the internet for high-impact vulnerabilities and tracks the potential internet-facing attack surface. We pride ourselves on providing accurate and actionable information. All signal, no noise. To demo our data, create an account and request a trial today.