The overall principle of static, dynamic or hybrid taint-tracking approaches is to check if there is a potential flow between a source and a sink. As the figure below shows, the input is a list of sources and a list of sinks, while the analysis checks if a flow between both exists (data leak).
There exist different kinds of sensitive sources and sinks in the area of Android security. For instance, the user’s location information or address book can be treated as a source, while the network connection or the SMS message sending facilities can be seen as sinks. In general, sources and sinks are accessed through specific API methods (e.g, getLastKnownLocation() for the user’s current location).
SuSi is a tool that automatically generates a list of Android sources and sinks by analyzing the complete Android source code. Our approach is version-independent and can simply be run again when a new Android version is released. This relieves security analysts from having to regularly create new lists of sources and sinks by hand.
SuSi is based on a supervised machine-learning approach. It first uses a small hand-annotated fraction of the API to train a classifier which is then able to fully-automatically classify all other methods in the whole Android API as a source, sink, or neither. SuSi is highly precise with a recall and precision of more than 92% as evaluated using ten-fold cross validation.
Furthermore, SuSi also categorizes the list of found sources and sinks. For instance, there is a category “location information” grouping all sources related to the user’s whereabouts (e.g., getLastKnownLocation()). We created 14 different kinds of source-categories and 17 different kinds of sink-categories, but SuSi can be extended with further categories at any time.
The output of the tool are two lists: categorized sources and categorized sinks in the form of Android API method signatures. Both lists can be used by taint-tracking tools. Furthermore, with the help of the categories one can now perform a demand-driven analysis of Android applications. If a user is only interested in flows between “location information” and “network”, she only selects the methods corresponding to both categories, significantly speeding up the analysis in comparison to finding paths between all possible sources and sinks.
We ran SuSi on Android 4.2 and found that there are a lot more sources and sinks than previously known to the scientific literature. We also found that there is usually more than one method for accessing the same source or sink. For instance, the well-known getDeviceId() method in the TelephonyManager class for the returning of the IMEI can be replaced by several other methods in different classes which return the same piece of data. Therefore, it is very important for a program analysis tool to use comprehensive source and sink lists. Otherwise, such a tool be easily circumvented by malware which just uses one of the less common methods for retrieving the data it wants to leak.
Are there any publications on SuSi?
Siegfried Rasthofer, Steven Arzt, and Eric Bodden. A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks. NDSS 2014 [Slides]
Steven Arzt, Siegfried Rasthofer, and Eric Bodden. Susi: A tool for the fully automated classification and categorization of android sources and sinks. Technical Report TUD-CS-2013-0114, EC SPRIDE, 2013.
There is also a poster on SuSi available for download.
Where can I find the source-code of SuSi?
It is available on GitHub