Identifying MQTT Brokers in the Wild

The use of publicly available MQTT brokers is common in many vertical and technological areas. I have been able to identify systems related to power generation, hotel management, finance, healthcare, pharmaceutical industry, facilities management, surveillance, workplace safety, fleet management, marine transportation, construction, natural resource management, agriculture, smart homes and many others.

Hackers have been sounding the alarm about this for years, but this message does not reach many parts of the internet. Many of these systems are clearly involved in powerful and potentially dangerous operations, and I think it’s safe to bet that the Mittarians have spent years researching these systems and have probably discovered many weaknesses that can be used in battle.

The past year I spent some time analyzing MQTT brokers discovered on the internet. In this position, I will present some of these findings, including examples of revelations of data that I have been able to identify, and others that I have not been able to identify. For a quick overview of the MQTT, read my message about the connected lock.

My MQTT detection method was originally sown by Shodan export with limited masking in addition to my hostset. My first step was the development of a sweeping turnstile. I have considered including standard Linux tools or something like paho-mqtt, but in the end they both have performance considerations and extra complexity.

Instead, I took a path as direct as possible and made a short Python script to send raw packets for the MQTT handshake and dump the responses to the file. The script is configured so that the recv calls up to 100 times with a timeout of two seconds. The aim was to obtain sufficient data to identify an interesting intermediary and at the same time avoid excessive use of resources.

While it is sometimes easy to identify an organization connected to an open broker, as was the case with U-Tech, in other cases it is extremely difficult. In general, there are several important characteristics that can help to identify the owner/operator of the service at risk.

Most of this information can be obtained through IP search engines such as Shodan, BinaryEdge or Censys. The whole procedure I followed to identify the server looks like this

  1. Is there a corresponding DNS item?
  2. WHOIS data?
  3. Are there any other services that can reveal details about the owner of the system or a specific application?
    1. TLS Certificates – Is there an FQDN or other descriptive name?
    2. HTTP responses – is there a website that identifies the system or an interesting car environment?
    3. Server banner – SMTP, FTP and some other protocols can reveal FQDNs
  4. Do the names of MQTT subjects betray an organization or a product?
  5. Does the published data mean anything?
    1. Look for things that look like models or make numbers.
    2. Search for e-mail addresses and check the reference areas.
    3. Search for sets of numbers that could be GPS coordinates.

This procedure was effective in identifying many hosts, but the error rate was also high. Many of these open MQTT brokers work on public cloud servers and without significant indicators to identify the system owner. In many cases I was able to determine which system I was looking for and/or where it was located geographically, but I had no specific contact to report to.

After identification I sent a sample letter by e-mail with the IP address and a short description of the problem. In some cases I also made a phone call when I thought the risk was high. Overall, this has not increased my chances of finding a satisfactory solution.

One of the systems I have identified has been able to provide safety monitoring services for compressed natural gas (CNG) filling stations throughout the region. I e-mailed, called and asked for help from various CERTs, but it took me a whole month (and a lot of personal energy) to get a response from the organisation concerned. In their reply, they acknowledged the problem and stated that they were working on solving it, which would take some time.

They explained that the solution to this problem lies in rewriting part of a very old application whose source code they don’t have. They have also assured me that the data will not be used in automated processes and therefore will not directly affect the technology being monitored.

This would mean that decisions are made by human operators, but I don’t understand how they can check the incoming data from an MQTT broker before they respond.

However, some revelations have been much more successful, as in this case:

Figure 1 Processes data fragment from the We Work-MQTT site

I have identified this server and the others as belonging to We Work, based on the WW naming convention and the periodic email addresses. I then confirmed the identity by matching the addresses in the MQTT data with the workplaces published by Wir.

I contacted We Work, who immediately shut down the system and explained to me that it was a non-professional Hakkaton project. I didn’t ask for further details, but an overview of the open data suggests that they may have used the S2 NetBox access control software. As far as I can see, the data seems to have been legitimate.

When I discovered through the scan that this host was interesting, I immediately logged into mosquitto_sub and was told that badges had been stolen, doors had been opened and the alarm had been deactivated. The information is supplemented with full names, e-mail addresses, access times and specific details about the door of a particular building.

Whether it was a non-production system or not, the data seemed legitimate when people came and went at times that corresponded to their time zone. It seems that the system is exchanging data in real time all over the world. It may not have been able to control the system or influence decisions, but if the data were authentic, there would have been absolutely unauthorized access to a terrible level of detail about the clients we work with.

Most of the interesting systems I encountered related to vehicle tracking, but I was only able to gather limited information about what was being tracked or who was being tracked. In one case I found URLs that link to pictures of driving licences.

In another case, I thought I’d first look at the background of a carpooling service in Mumbai, which is based on links to the driver’s zoning plan and advertising URLs. After an unsuccessful search to identify the specific organization using this server, I moved on to the next node on the list of tens of thousands.

It wasn’t until a few months later that I reviewed the data on this blog and made a decisive confession. After a closer look at the URLs of the advertisements, I noticed that the name of the S3 section in the URL could be linked to the name of the organization. I googled the acronym S3 bucket with Mumbai and quickly identified the likely owners of the system.

What I thought was resold was actually an ad for sellers. Although I never contacted this company, the server is now locked and Shodan does not register the same names on the new servers. I may have done a scan at the beginning of their system implementation, or they may have noticed and fixed this configuration error themselves.

When I have scanned data, I have found numerous systems that process opaque or obscure data with little or no indication of how it might be attacked or what effects it might have. I have no doubt that the next system on my list is absolutely interesting for intruders.

This system is a national lottery with jackpots up to 6 million dollars. The messages from this broker turned out to contain the details of the sale of the lottery tickets, including the booth where they were sold, the version of the software that was there, the support staff that worked there, and other binary data that I couldn’t decipher.

I assumed that these details might have had something to do with a particular ticket purchased, but I had no way to confirm or deny it. It is interesting to note that most lottery kiosks in this system were older, unsupported versions of Android. I have informed the lottery several times, and although I have never received a response, I have confirmed that the server is no longer exposed.

Figure 2 Processed name of the lottery data sample

Other interesting results

In total, I contacted or tried to contact owners of systems with less than 50 IP addresses out of the tens of thousands included in the scan results. Some other interesting results are briefly described below.

Cameras ML

I came across a small number of MQTT brokers exchanging structured data, similar to my earlier work on image analysis based on Tensor Flow. These systems appear to be ML/AI IoT cameras installed in retail outlets. Some studies have shown that some of them are probably security cameras in the shops. Published data usually contains the image’s URL and metadata about the number, age, and domain of the people in the image. Some of them also periodically refer to other attributes of the image, such as the fact that someone has a hat, glasses or beard.

Energy sector

Many of my scans related to the speed of the turbine, the power generated in kilowatts and other telemetry points relevant to power generation. The data was very general, and was placed almost entirely in public clouds. Some of them were certainly wind farms, but there could also be solar farms or hydropower plants. I haven’t been able to identify any of these systems accurately and, frankly, I have no idea what effect they have had.

These systems are unlikely to be protected against false or replicated telemetric data. The question remains what physical consequences such an attack can have. For example, if an LNG facility is discovered, could a terrorist cause an operational error similar to that caused by an explosion in 40 houses as a result of overpressure on the Columbia pipeline in Massachusetts?

Figure 3- Burned house in Merrimack Valley, Massachusetts (13 September 2018)

Related Tags:

mqtt subscribe to multiple topics,mosquitto subscribe example,mqtt get list of topics,mqtt topics case sensitive,mqtt was introduced by,data transfer unit for coap