Skip to content

What is CASM?

CASM (Create Assembly Segmentation Matches) is a tool created by Internet 2.0, specifically crafted for the purpose of detecting code reuse and originally designed to be used with ransomware samples. This product allows malware analysts and reverse engineers the ability to determine the underlying similarities that exist between files, leveraging assembly code within each file as the key reference point. To facilitate this analysis, CASM harnesses the power of multiple mathematical sequences:

  • Jaccard similarity: a proximity measurement used to compute the similarity between two objects.
  • Levenshtein distance: a string metric for measuring the distance between two sequences.
  • Ssdeep string matching: ssdeep is a tool used to determine similarities in files, we use it to compare similarities in strings.
  • Sequence match ratio comparisons: a similarity formula designed to compare the similarities between two sequences.

The exact formula used in order to determine the similarity can be summed up as the following equation:


What is code reuse?

Code reuse can be summed up with: “utilizing existing software components to construct new software solutions.” An example, envision a scenario where three distinct tools are designed to achieve a specific task. Now, imagine combining these three tools into a single tool that accomplishes the same task more efficiently. This shows the concept of code reuse.

In the realm of malware analysis, the presence of code reuse within a sample can indicate either the usage of code from another sample or the work of the same developer reconfiguring their tools to accomplish a different task. One approach to identifying code reuse involves disassembling the file to its behavioral (assembly) code and comparing it with another sample. The following example shows this process:

Although it may initially appear daunting, rest assured that the process is extremely straightforward. The key aspect to focus on is the relative abundance of red and green. By associating green with similarity and red with dissimilarity, compelling evidence emerges, showcasing the striking resemblance between these two code strings. This principle is applied when conducting code reuse analysis: disassembling the binaries, consolidating the assembly code, checking the code for disparities, and calculating a similarity score. This approach enables the identification of commonalities and variations, providing a measure of similarity.

What makes CASM different?

CASM stands out from other code reuse analysis tools by its unique approach that solely focuses on the disassembled assembly code of the file. Unlike other tools, CASM does not take into account other factors such as code signing certificates or functionality when determining similarity. Its objective is to meticulously compare the assembly code through intelligent grouping and employ advanced mathematical techniques to generate an accurate similarity score. Integrating mathematically sound algorithms, CASM revolutionizes the field of code reuse analysis, offering an intriguing methodology for uncovering instances of code reuse within files. This process unlocks a realm of complexity and fascination in the pursuit of identifying code reuse.

CASM in action

Comparison #1:

The first files we will be analyzing are Windows System32 files that are obviously very similar from the names.

The level of similarity observed between these two files may not come as a surprise, given their similar naming schemes and the fact that they are both Windows files. However, the real intrigue lies in exploring the realm of files that are not inherently expected to be associated with one another.

Comparison #2:

The degree of similarity observed between these files is astonishing, particularly when considering the subtle differences they exhibit. Despite their minimal disparities, it is intriguing to note that VT (VirusTotal) categorizes these two files as different entities, with one of them barely registering any detection. This contrast adds an element of surprise to the analysis emphasizing the nature of their relationship.

Comparison #3:

The consensus within the cybersecurity community suggests that BlueSky and Conti either belong to the same group or that a new group has adopted the same ransomware. However, what often receives less attention is the remarkable similarity between BlueSky and Babuk. Upon examination of the aforementioned similarities, it becomes evident that BlueSky and Babuk share significant commonalities. This raises an intriguing question: Could it be possible that Conti and Babuk also share substantial similarities, perhaps indicating that they stem from the same developers? Exploring these potential connections sheds light on the intricate relationships that exist within the realm of ransomware development and underscores the need for further analysis and investigation.


CASM emerges as an invaluable asset in the realm of code reuse analysis, unearthing previously undiscovered commonalities within samples and shedding new light on the cyber security landscape. Its exceptional utility empowers analysts with a fresh and exhilarating approach to swiftly and effectively conduct code reuse analysis, leveraging the capabilities of Malcore. Today, by simply signing up for Malcore, users can readily harness the full potential of CASM at no cost, facilitating seamless and comprehensive analysis. Embrace the opportunity to explore the vast possibilities that CASM and Malcore offer, revolutionizing the field of code reuse analysis.