Finite State Blog

Software Composition Analysis (SCA) vs. Java Über Jars

Written by Julius Davies | Jan 10, 2023 3:45:00 PM

Introduction

Über jars are a type of reusable Java library that applications sometimes (knowingly or not) incorporate into their systems. Über jars are particularly challenging for software composition analysis (SCA) tools to understand because their structure and organization are complex.

In this blog post, I explain what über jars are and why they exist, and I provide a mini-benchmark to see how current SCA tools deal with this type of Java library.



TLDR

Circa February 2021, both MergeBase (acquired by Finite State in June 2024) and Sonatype use deep binary analysis to measure software composition within über jars.

  • Finite State is the only SCA tool I observed with comprehensive support for über jars.
  • Sonatype also provides decent support but misses a few obvious cases. (Strangely, their legal analysis completely ignores their über jar results.)
  • OWASP Depedency-Check and JFrog Xray are not bad, but their scanning is based solely on metadata files found inside über jars.
  • Snyk, Mend (previous WhiteSource) and Github Dependabot currently have no ability to understand über jars at all.

 

Background

Über jars are the Java equivalent of taking everything in your fridge and throwing it all into your largest pot, giving it a good stir. From an SCA (Software Composition Analysis) perspective, they are a bit of a nightmare.

Recall: normally, you point your SCA tool at a Java jar file (a reusable Java library), and your SCA tool responds by telling you the jar file’s name, version, and known vulnerabilities.

But what if that single Java jar file is actually an agglomeration of dozens of Java jar files? What if some maniac cracked open all your jar files and poured all their contents into a single mega jar? That’s exactly what an über jar is, and your SCA tool is going to need to reverse-engineer the contents accurately before it can say anything.

 

Why would anyone create an über jar in the first place?

Java programs are awkward to invoke. You have to tell Java the exact locations of all the jar files your program is using. Über jars are a way around that problem.

For example, a typical normal java program is started like this:

java -classpath lib1.jar:lib2.jar:main.jar name.of.MainEntry

With an über jar it’s less typing, since “lib1.jar” and “lib2.jar” have been blended directly into a single “main-uber.jar” file:

java -classpath main-uber.jar name.of.MainEntry

In this way über jars make Java programs easier to distribute and easier to start. That’s the main reason why they exist.

Recall that Jar files are actually just zip files. You can rename them to “.zip” and then double-click on them if you ever want to see what’s inside them. Über jars are what you get if you unzipped all of your Jar files and combined all the contents into a single zip file instead.

 

Über jars are a challenge for SCA

Most SCA tools are geared towards providing a single succinct answer for each library they scan.

Identified Library:  
**Apache Commons-Collections 3.2.1.

**Vulnerabilities:
CVE-2017-15708, CVE-2015-7501, and CVE-2015-6420

With über jars the answer is more complicated. “Well, actually… this library is a combination of many libraries.”

At Finite State we analyze every jar file against our master database for this possibility. For example, consider “apacheds-all-1.5.5.jar”, a large über jar containing over 500,000 lines of code coming from dozens of libraries. When we compare this jar file against all known versions of “slf4j-api” here are the results:

Match Ratio Known Library Version
81.0% slf4j-api@1.5.11
90.5% slf4j-api@1.5.8
100.0% slf4j-api@1.5.6
90.5% slf4j-api@1.5.5
90.5% slf4j-api@1.5.4

 

These results show that version 1.5.6 of slf4j-api is contained inside the apacheds-all-1.5.5 über jar file.

In the “slf4j-api” case there is also another hint inside the über jar. If I grep the jar’s contents for “sl4fj-api” I see these two entries:

META-INF/maven/org.slf4j/slf4j-api/pom.xml
META-INF/maven/org.slf4j/slf4j-api/pom.properties

Opening the latter, I see this:

#Generated by Maven
#Fri Nov 21 14:48:07 CET 2008
version=1.5.6
groupId=org.slf4j
artifactId=slf4j-api

This gives me further confidence that my binary analysis is correct: version 1.5.6 aligns with my Finite State result. Some SCA scanners only consider this metadata when examining über jars, but philosophically I don’t agree with that approach, since metadata is not always present, as in the bouncy-castle example below. Metadata is also vulnerable to transcription mistakes and tampering.

You might be curious why this metadata is even present in the first place.

My own theory: it was probably present in the original “slf4j-api” jar. Über jars don’t just combine the software files – they combine all the files! And so if a metadata file is present in the original “slf4j-api” file, it will be dutifully copied into the über jar. I can download the original and see for myself. Sure enough, running “unzip -l slf4j-api-1.5.6.jar” shows both of those metadata files were in the original.

Moving onto to an example without metadata, here’s the results when we compare our über jar against “bcprov-jdk15”:

Match Ratio Known Library Version
84.7% bcprov-jdk15@1.44
91.3% bcprov-jdk15@1.43
100.0% bcprov-jdk15@1.40
82.0% bcprov-jdk15@1.38
48.6% bcprov-jdk15@1.32

 

There is no metadata available to warn consumers that the highly vulnerable version 1.40 of bcprov-jdk15 was copied into apacheds-all-1.5.5.jar. Unfortunately bcprov-jdk15@1.40 contains over 15 known-vulnerabilities. Scanners that rely on metadata (such as JFrog Xray and OWASP Dependency-Check) will miss this. And of course scanners that lack über jar handling (such as WhiteSource and Snyk) will also miss this.

Using our high-confidence matches we then query our known-vulnerability database for any corresponding vulnerabilities. Our technique is based on binary analysis – no metadata is involved at all, since metadata can be inaccurate. Using this technique we are able to identify dozens of sub-components encapsulated by the apacheds-all-1.5.5 über jar. Here’s a partial listing based on Finite State’s analysis:

  1. 100.0% – antlr/antlr@2.7.7
  2. 100.0% – commons-io/commons-io@1.4
  3. 100.0% – commons-lang/commons-lang@2.4
  4. 100.0% – org.apache.directory.server/apacheds-core-jndi@1.5.5
  5. 100.0% – org.apache.directory.shared/shared-ldap@0.9.15
  6. 100.0% – org.apache.mina/mina-core@2.0.0-M6
  7. 100.0% – org.bouncycastle/bcprov-jdk15@1.43
  8. 100.0% – org.slf4j/slf4j-api@1.5.8

(Etc… 25 more sub-components identified!)

 

Quick Competitive Check

We were curious to see if competing SCA tools are able to handle über jars. What follows is a quick benchmark against a half-dozen popular SCA tools.

Methodology

For each SCA tool (Finite StateOWASP Dependency-CheckSnykMendSonatype, etc…):

  1. Git clone: Repository
  2. Run “mvn install”.
  3. Apply each SCA tool against the built “vuln-example-apacheds-all”.
  4. Observe and compare the scan results.

Mini-Benchmark Results

As of February 2021, the apacheds-all-1.5.5 über jar contains two vulnerable sub-components. One of these (bcprov-jdk15@1.40) can only be identified using binary approaches since it had no metadata in the first place, and one of these (commons-collections@3.2.1) can be identified either via binary approaches or via metadata scanning.

We group the benchmark results into 3 categories:

1. Scanners that do not support über jars at all.

Snyk and Mend appear to have no idea that “apacheds-all@1.5.5” is made by combining many jar files together. Similarly, Github’s Dependabot also has no idea about this.

2. Scanners that support a metadata-based understanding of über jars.

OWASP Dependency-Check and JFrog Xray both detect the “commons-collections@3.2.1” metadata inside the über jar.

3. Scanners that support deep understanding of über jars.

Sonatype fails to identify any known-vulnerabilities with respect to commons-collections@3.2.1, and yet it does correctly identify that apacheds-all@1.5.5 contains bcprov-jdk15@1.40! This is a lopsided result: Sonatype clearly has a deep understanding here (otherwise it would be impossible to identify bcprov-jdk15), and yet somehow Sonatype is failing to spot the easy one. We also noted that Sonatype reported the license as Apache 2.0, when bcprov-jdk15 uses the MIT license.

Finite State identifies all vulnerabilities correctly in this case. 🙂

 

Conclusion

Über jars are a special type of Java software component made by combining several jars into a single jar. Aside from Finite State, most SCA scanners currently provide sub-par or even zero support for this component type.

Last piece of advice: Have Über jars? Give Finite State a closer look!