Over the past few days, the news has been full of a report of “dormant” malware that infected millions of Android devices. (German article here on Heise.de.) The malware previously went unnoticed by laying dormant for several hours, sometimes multiple days, after installation, in some cases even requiring a reboot of the device to become active. Dynamic-analysis procedures usually only run for minutes and for efficiency reasons do not simulate situations like reboot Contrary to the current perception, though, this problem had long been identified, and in fact today with this article we are revealing Harvester, a new tool to address exactly this problem.
Harvester* uses a unique novel combination of static and dynamic analysis and code transformation to (1) identify and eliminate emulator and timeout checks from apps, and (2) that way allows for the extraction of interesting runtime values such as reflective method calls, target numbers of calls to the SMS APIs, account-data hard-wired in the malware, etc. In addition, Harvester is resilient against virtually all current cases of code obfuscation.
As we show in our Technical Report (see below), Harvester can retrieve most runtime values of interest in a short amount of time, and can be used to significantly improve the quality of existing static and dynamic analysis tools.
In the past, we have successfully used Harvester to help retrieve inside information about current threats such as the banking trojan BadAccents we reported on earlier.
How does it work?
The main idea of Harvester is to apply dynamic analysis not to the original malware app but to a version of app that has been carefully altered to allow for an easy extraction of values of interest. Consider the original app code shown to the left. This app has an emulator check, returning early if the deviceId is all zeros. Using a specialized backwards slicing, Harvester determines that the check cannot influence our value of interest, in this case the number used in the sendTextMessage call. Therefore the check is simply eliminated from the app, as is all other code not influencing the value of interest. Executing this app thus immediately triggers the malicious behavior – no improved emulation or waiting required. Conditionals that may influence the value at hand are replaced by controllable boolean variables as shown to the right. Using a limited number of fast executions of the modified code, Harvester can quickly cover all necessary branches and extract all three relevant phone numbers.
Our report is currently under submission at a major conference. We plan to make the implementation available after the paper has been accepted for publication.
Harvesting Runtime Data in Android Applications for Identifying Malware and Enhancing Code Analysis , Technical report TUD-CS-2015-0031, EC SPRIDE, 2015. [bib]
* Patent pending