Govt abstract
Binary diffing, a way for evaluating binaries, could be a highly effective software to facilitate malware evaluation and carry out malware household attribution. This weblog publish describes how AT&T Alien Labs is leveraging binary diffing and code evaluation to cut back reverse-engineering time and generate risk intelligence.
Utilizing binary diffing for evaluation is especially efficient within the IoT malware world, as most malware threats are variants of open-source malware households produced by a variety of risk actors. Producing and sustaining static signatures for variations on IoT malware is tedious, because the meeting code typically adjustments throughout variants and architectures and textual content strings are topic to modification. Because of this, AT&T Alien Labs created a brand new open-source software, r2diaphora, to port Diaphora as a plugin for Radare2, and included some use instances on this weblog.
What’s binary diffing?
Binary diffing (or program diffing) is a course of the place two recordsdata are in contrast at instruction degree, searching for variations in code. Risk actors can simply rework the meeting code for a program with out modifying its precise behaviour, so the everyday “line-by-line” diffing just isn’t adequate when taking a look at malware – a extra superior method is required.
There are a number of binary diffing instruments publicly out there, resembling Diaphora, BinDiff, and DarunGrim. Alien Labs is utilizing Diaphora, as we consider it’s the most superior of all of the out there choices. Moreover, Diaphora has the additional benefit of being open supply, permitting Alien Labs to switch it for our wants.
How can binary diffing be employed to determine malware?
Diaphora works by analyzing every perform current within the binary and extracting a set of options from every analyzed perform. These options are later used to match features throughout binaries and discover matches. If as a substitute of immediately evaluating options, we leverage them to construct a database of malicious features (indicators) for identification functions, we will then start analyzing incoming binaries and attempt to discover matches amongst their features when evaluating to the indicator database.
If sufficient matches are discovered within the analyzed binaries, we will safely assume the analyzed pattern is a malware pattern. We are able to additionally be aware which malware household the features belong to within the indicator database, thus acquiring household attribution for the analyzed samples.
Porting Diaphora to Radare2
Diaphora works as an IDA Professional plugin. As a way to work, it wants a sound IDA license and, consequently, legitimate Hex-Rays licenses for every CPU structure it’s possible you’ll wish to decompile. As this price of those licenses is sort of excessive, Alien Labs regarded for a less expensive different, so the group may leverage it.
As such, we determined to port the present Diaphora to the Radare2 disassembly framework. The ported model of Diaphora, named r2diaphora, can be open supply and out there right here.
Radare2 (r2) is an open-source disassembly framework that helps a really big selection of CPU architectures. It additionally bundles a succesful decompiler and helps the Ghidra decompiler as a plugin. As such, r2 is nicely fitted to our goal of porting Diaphora to an open-source disassembler.
Extra adjustments made to the unique Diaphora included swapping the SQLite3 databases for MySQL. This variation was carried out for the malware attribution course of described beforehand, as multiple analyst could be writing to the indicator database. With a number of analysts writing to the database, the SQLite database would must be shared throughout staff members and permit parallel write/learn operations. SQLite databases usually are not made for this type of utilization, so the Alien Labs staff swapped it for one more database engine higher designed for the duty.
Set up
As r2diaphora makes use of Radare2 and MySQL they must be set-up previous to its utilization. Radare2 ought to be put in domestically, whereas the MySQL server will be distant or native. As soon as the surroundings is ready up you’ll be able to set up it with pip set up r2diaphora. This pip bundle installs three command line utilities: r2diaphora, r2diaphora-db and r2diaphora-bulk.
- r2diaphora: The principle command line utility, analyzes and compares recordsdata.
- r2diaphora-db: Performs database administration and configuration.
- r2diaphora-bulk: Analyzes binaries in batches.
Additional utilization choices will be obtained with the -h / –help command line possibility in every of them.
As soon as the pip bundle is efficiently put in you’ll be able to enter your database credentials with r2diaphora-db config -u
Lastly, if you wish to use the r2ghidra decompiler, set up it with the r2pm -ci r2ghidra command, if it’s not put in already.
Utilization
As said beforehand, r2ghidra lists all out there choices if executed with the -h flag. Presently, they’re the next:
For example, we will execute r2diaphora on some take a look at IoT samples. Yow will discover file hashes within the Related Indicators appendix.
First take a look at – evaluating to Sakura (a Gafgyt variant) samples with the identical structure:
r2diaphora 562b4c9a40f9c88ab84ac4ffd0deacd219595ab83ed23a458c5f492594a3a7ef 770363f9fd334c3f3c4ba0e05a2a0d4701f56a629b09365dfe874b2a277f4416
Determine 1. r2diaphora output for Sakura samples with the identical structure.
Observe how r2diaphora may determine the similarities between the 2 recordsdata. The system managed to search out 40 matches out of 56 attainable (71%). Moreover, the similarity ratios for the matched features are near 1.0, indicating a really shut resemblance within the matched features. Moreover, the outcomes level in the direction of true optimistic matches because the matched features have the identical title and variety of fundamental blocks.
Second take a look at – evaluating Sakura samples with completely different architectures:
r2diaphora 17c62e0cf77dc4341809afceb1c8395d67ca75b2a2c020bddf39cca629222161 6ce1739788b286cc539a9f24ef8c6488e11f42606189a7aa267742db90f7b18d
Determine 2. r2diaphora output for Sakura samples with completely different structure.
On this case, we see how the variety of matches has decreased from the earlier take a look at. This was anticipated as it’s more durable to match features throughout completely different architectures. The similarity ratios have additionally decreased because the meeting code differs in all of the in contrast features. Nonetheless, r2diaphora acknowledged many similarities between each samples and recognized right matches throughout the in contrast recordsdata.
Third take a look at – evaluating a Sakura pattern to a Yakuza (one other Gafgyt variant) pattern, each samples having completely different architectures:
$ r2diaphora sakura/594a6b2c1e9beac3ad5f84458b71c1b7ec05ee0239808c9a63bc901040e413a3 yakuza/91392f5dbbfd4ad142956983208a484b91ac5e84c4f9a9fcb530a9b085644c93
Determine 3. r2diaphora output for Sakura and Yakuza samples with completely different structure.
On this case, observe how the variety of matches have decreased even additional whereas the ratios have been maintained largely regular. That is as a result of samples being completely different variants that carry out completely different modifications over the bottom Gafgyt supply code.
It’s also notable that the processCmd perform has been in a position to be matched with a low ratio. processCmd is the perform that parses the obtained instructions from the Command & Management server. The low ratio on this match is as a result of variants having the ability to deal with completely different instructions, therefore their implementation being completely different. Nonetheless, the system was in a position to match it as a consequence of a typical fixed current in each features.
Conclusion
Code similarity evaluation is a strong software that may be leveraged to determine and attribute malware. Whereas not flawless, program diffing can bypass most of the weaknesses of static signatures and thus could possibly be used together with conventional detection strategies to construct a extra strong detection pipeline.
Appendix
Related Indicators (IOCs)
TYPE |
INDICATOR |
DESCRIPTION |
SHA256 |
132948bef56cc5b4d0e435f33e26632264d27ce7d61eba85cf3830fdf7cb8056 |
Sakura pattern, Arch: ARM, EABI4 |
SHA256 |
136dbd3cfa947f286b972af1e389b2a44138c0013aa8060d20c247b6bcfdd88c |
Sakura pattern, Arch: Intel 80386 |
SHA256 |
17c62e0cf77dc4341809afceb1c8395d67ca75b2a2c020bddf39cca629222161 |
Sakura pattern, Arch: ARM, EABI4 |
SHA256 |
19e0f329b5d8689b14d901b9b65c8d4fb28016360f45b3dfcec17e8340e6411e |
Sakura pattern, Arch: Motorola m68k |
SHA256 |
4cc11ffb3681ebced1f9d88e71b70a87e6d4498abca823245c118afead67b6a5 |
Sakura pattern, Arch: MIPS, MIPS-I model 1 |
SHA256 |
562b4c9a40f9c88ab84ac4ffd0deacd219595ab83ed23a458c5f492594a3a7ef |
Sakura pattern, Arch: ARM, EABI4 |
SHA256 |
594a6b2c1e9beac3ad5f84458b71c1b7ec05ee0239808c9a63bc901040e413a3 |
Sakura pattern, Arch: x86-64 |
SHA256 |
5fec87479a8d2fa7f0ed7c8f6ba76eeea9e86c45123173d2230149a55dcd760d |
Sakura pattern, Arch: MIPS, MIPS-I model 1 |
SHA256 |
603d14671f97d12db879cc1c7cd6abfa278bf46431ac73aeb6b3a4c4c2b16b9f |
Sakura pattern, Arch: x86-64 |
SHA256 |
6b128a64a497eb123f03b77ef45e99e856282dc9620dc26ab38998627a8f3216 |
Sakura pattern, Arch: Renesas SH |
SHA256 |
6ce1739788b286cc539a9f24ef8c6488e11f42606189a7aa267742db90f7b18d |
Sakura pattern, Arch: Intel 80386 |
SHA256 |
770363f9fd334c3f3c4ba0e05a2a0d4701f56a629b09365dfe874b2a277f4416 |
Sakura pattern, Arch: ARM, model 1 |
SHA256 |
7c8ba5f88b1c4689a64652f0b8f5e3922e83f9f73c7e165f3213de27c5fb4d05 |
Sakura pattern, Arch: PowerPC |
SHA256 |
8090c3a1a930849df42f7f796d42e0211344e709a5ac15c2b4aca8ca41de2cd3 |
Sakura pattern, Arch: Intel 80386 |
SHA256 |
94a279397b8c19ec7def169884a096d4f85ce0e21ff9df0be3ce264ef4565ea7 |
Sakura pattern, Arch: x86-64 |
SHA256 |
96bb3e5209e083544ea6a78bc6fc4ebc456e135a786d747718d936af3b063298 |
Sakura pattern, Arch: ARM, EABI4 |
SHA256 |
a079dfd60b55a7d74dd32d49a984bea43665b8b225beceae5b272944889217f6 |
Sakura pattern, Arch: MIPS, MIPS-I model 1 |
SHA256 |
b6c2f02b1bed62a6b845d5f13d9003f5aa3f6d0da3e62fa48d9822872453de10 |
Sakura pattern, Arch: Renesas SH |
SHA256 |
cef15aa60dc2c09fe117e37e07399f0ef89dca9f930ce13ac1e29f8cf63d9a31 |
Sakura pattern, Arch: Motorola m68k |
SHA256 |
e984334bbdd1179aadbde949f7c1b0fb02b6c18cb4a56d146150853b18adfa79 |
Sakura pattern, Arch: MIPS, MIPS-I model 1 |
SHA256 |
2858982408bf1664b622e830ad83b871749608a7533e94672153ff90caa658a9 |
Yakuza pattern, Arch: ARM, EABI4 |
SHA256 |
2b7262cae9e192fa7921f3ec02e0f924b32de3d418842fdad9a51603589a54c7 |
Yakuza pattern, Arch: Intel 80386 |
SHA256 |
2faf7437c769abd92347d6f0a77f001523ec41c02d2bf12e3cebf5b950457ba3 |
Yakuza pattern, Arch: Intel 80386 |
SHA256 |
4fc23e8409becb028997c2f0f2041e2dc853018b71e009e3d66f33876d5d4e99 |
Yakuza pattern, Arch: Renesas SH |
SHA256 |
6554d5edb401e2def2ef9fbb82b591351d3c8261ce0a20c431470f1c68fa3aea |
Yakuza pattern, Arch: ARM, model 1 |
SHA256 |
8005db9431013f094a2114046679ab971e62a8776639d6c2903fcc5d2fe8065c |
Yakuza pattern, Arch: x86-64 |
SHA256 |
91392f5dbbfd4ad142956983208a484b91ac5e84c4f9a9fcb530a9b085644c93 |
Yakuza pattern, Arch: ARM, model 1 |
SHA256 |
b8aadb66183196868a9ff20bebd9c289fbfe2985fb409743bb0d0fea513e9caf |
Yakuza pattern, Arch: ARM, EABI4 |
SHA256 |
d4f223fc5944bc06e12c675f0664509eeab527abc03cdd8c2fbd43947cc6cbab |
Yakuza pattern, Arch: ARM, model 1 |
SHA256 |
f64b5f6dd7f222b7568bba9e05caa52f9e4186f9ba4856c8bf1274f4c77c653c |
Yakuza pattern, Arch: Intel 80386 |