Filtering 101
From Digital Forensics Framework
Contents |
Overview
Since DFF 1.2, new powerful features have been released: finding and filtering items (nodes in DFF jargon) from simple queries to very precise ones. Of course, these functionnalities are available through User Interfaces (both graphical and console) but also directly from DFF's API. When a search is launched, matching nodes are directly provided : you don't need to wait the end of the processing to be able to start working !
Now, let's have a brief overview concerning syntax:
- classical comparison operators >, >=, <, <=, !=, ==
- logical operators support: AND / OR / NOT
- parentheses and operators precedence
- 4 pattern syntaxes when looking for strings: fixed, wildcard, regexp, approximate (use of TRE library)
- strings, numeric (decimal and hexadecimal), boolean and datetime (ISO 8601) types are managed
- kind of pythonic list: (size in [0x200, 1024, 4096, 8192]) is an equivalent to (size == 512 or [...] size == 8192)
Concerning filtering abilities, the content of nodes can be filtered but also all their attributes. It means that attributes generated by modules can be used to filter (for example exif metadata with metaexif module, emails attributes provided by mailbox module (pff), ...).
The grammar already provides some generic keywords for the most common attributes and interprets other attributes if provided between double quotes as shown in the following examples:
- all times attributes can be tested at once (time == 2011-09-15T00:00:00)
- but also independently such as ("MFT altered time" >= 2011-09-15T13:37:00 and "ntfs.$STANDARD_INFORMATION.MFT altered time" <= 2011-09-15T23:42:00). Note that it's possible to use either relative or absolute attributes name. Be aware that using relative attributes is slower than using absolute and the first encountered one will be used.
- looking for specific word in content: (data == fz("password", i)) to find the word password in a case insensitive and approximate way (if "pasword" is encountered, it is considered as a match)
- mime types (based on libmagic): mime == "image" or mime in ["image/jpeg", "image/png"]
- node type: deleted == true, file == true
Then you can mix all of these filters to find all nodes being of Portable Executable file format without .exe and .dll extension for example:
name != w("*.exe", i) and name != w("*.dll", i) and "type.mime" == "PE32"
For those interested in what goes under the hood, Filter objects use Abstract Syntax Tree (AST) which reflects the provided query. The AST is created from a parser generated by the conjoint use of Flex and Bison. Flex is a lexical analyzer which tokenises input strings and provides Bison with tokens. Bison is a parser generator which is in charge of reading sequences of tokens (those provided by Flex) and deciding whether the sequence conforms to the syntax specified by the grammar. Mixing Flex and Bison enables to quickly develop powerful and strong parsers but also eases the extension of a syntax grammar.
During the parsing of the query the AST is created with different types of nodes reflecting supported expressions. When processing is started, the AST is traversed and each node compiles its arguments if needed (string nodes compile regexp pattern for example). Finally AST traversal is performed for each provided node of the VFS. Dependending on the logical operators AND / OR, nodes of the AST will be visited differently. Further versions will add a processing cost for each node of the AST in order to speed process by choosing faster evaluation branches.
Filters
The filtering functionalities are presented when using the graphical user interface, the console and the interpreter. The examples are based on a dump coming from M57-Patents Scenario found on http://digital-corpora.org available at https://domex.nps.edu/corp/scenarios/2009-m57/drives/charlie-2009-11-16.aff. The demonstration is based on the raw version on the dump and not the AFF one in order to have faster processing (both for modules and filtering). The aff version is 3 GB and the raw version is 10 GB.
For the rest of this tutorial, it will be assumed that the dump has been loaded and partition and ntfs modules have been applied. If you need further information to know how to add dumps and apply modules, you may have a look at the Quick start guide.
Graphical User Interface
| |
Clicking on |
| |
WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
| WIP |
Console
A module named "find" has been developed to provide finding / filtering functionalities in the console. It supports three required and three optional arguments:
- required:
- filter_name: corresponds to the name of the created filter. It is used by "save_results" argument.
- expression: corresponds to the query to process.
- root_node: corresponds to the starting point of processing
- optional:
- recursive: if enabled, it will recursively search nodes from the provided "root_node".
- save_result: this option enables to automatically saves matching nodes by creating links under the folder /Search items/filter_name.
- verbose: if activated each matching nodes will be displayed in the console.
The following example will look for all nodes of type Portable Executable with the extension .exe and creates links to matching nodes under the folder /Searched items/PE32 with exe extension:
find / --filter_name PE32\ with\ exe\ extension --expression "type.magic"\ ==\ re("pe32",\ i)\ and\ name\ ==\ w("*.exe",\ i) --recursive --save_result
Depending on the computer and the query, it can take some time to process. Nevertheless, it is possible to move the job in background by using the Ctrl+z combination. This permits to use console while the module 'find' is processing. Using the 'jobs' command provide state of modules. For example:
[3] background find dff / > jobs result: pid name state info [0] local finish [1] partition finish [2] ntfs finish Done [3] find exec 2 % (matching node: 23) dff / >
Once terminated, the result of the module will be displayed in the console and links become available:
dff / >
total nodes: 28175 - 0x6e0f
total of matching nodes: 893 - 0x37d
dff / > fileinfo Searched\ items/PE32\ with\ exe\ extension/
result: name : ntoskrnl.exe
node type : file
generated by: ntfs
size: 2189184
attributes:
ntfs
$DATA
Header
Attribute number: 3 - 0x03
Compression unit size: 0 - 0x00
Content actual size: 2189184 - 0x216780
Content allocated size: 2191360 - 0x217000
Content initialized size: 2189184 - 0x216780
Ending VCN: 534 - 0x216
[...]
MFT entry number: 14885 - 0x3a25
MFT physical offset: 3236467712 - 0xc0e89400
accessed: 2009-11-11 22:06:41
altered: 2009-02-08 03:35:26
creation: 2009-02-08 03:35:26
type
magic: PE32 executable (native) Intel 80386, for MS Windows
magic mime: application/x-dosexec; charset=binary
Graphical Interpreter
In this section, the use of Filter and Search objects is directly within the Python interpreter is described. As a reminder, it is available from the graphical user interface of DFF by clicking on
.
- Filter in less than 10 lines of Python
Python Interpreter
>>> v = vfs()
>>> root = v.getnode('/')
>>> f = Filter("jpeg")
>>> f.compile('mime == "jpeg"')
>>> f.process(root)
>>> nodes = f.matchedNodes()
>>> print len(nodes)
>>> for node in nodes:
... print node.absolute(), node.attributesByName("type.magic mime", ABSOLUTE_ATTR_NAME)
At first, it is necessary to create a vfs instance (line 1) in order to be able to have access to created nodes (line 2). Search will start from "/" node, the root of the VFS. Then a Filter object is instantiated with a filter name (line 3). Once the filter object is created, it is possible to compile a query (line 4). If the provided query is not well formatted, an exception will be thrown (RuntimeError). If the compilation of the query is successful, the filter object is ready to process nodes (line 5). Depending on the computer and the query, the process can take some time. Once processing is finished, matching nodes can be obtained (line 6) and it's possible to loop over them to display their absolute name (path + name) and one of their attributes (line 7) for example.
Search
Filter object relies on the search API, which can be used with DFF virtual files too. The following example presents a way to carve JPEG headers in the first partition.
>>> node = v.getnode("/Logical files/charlie-2009-11-16.raw/partition/Partition 1")
>>> vfile = node.open()
>>> headers = Search()
>>> headers.setPatternSyntax(Search.Fixed)
>>> headers.setPattern("\xff\xd8\xff")
>>> headers.setCaseSensitivity(Search.CaseSensitive)
>>> headers.compile()
>>> hoffsets = vfile.indexes(headers)
>>> print len(hoffsets)
7516
>>> for hoffset in hoffsets:
... print hex(hoffset)
At first, the "Partition 1" node is obtained (line 1) and is then opened which returns a vfile object (line 2). DFF vfile objects provide the same methods than Python File Object but also gives access to search methods. Before using these methods, it is necessary to create a Search object (line 3). The search object is then configured by setting its pattern syntax (line 4), here setted to Fixed, then the pattern (line 5), corresponding to JPEG header in Python hexadecimal string and finally, the case sensitivity (line 6) which, in this case, is not really useful and is only used to present the functionality.
The search object is now configured and has to be compiled (line 7). If there's an error, an exception will be thrown. The search object is ready to use and can be provided to vfile searching methods. Here, the "indexes" method is used (line 8) which returns all offsets in the file where the pattern has matched. As ever, depending on the computer and the pattern, it can take some time. Once search is finished, offsets are available as a list. Number of matching patterns can be obtained with classical Python functions (line 9) and the list can be iterated to do further process (line 10).
Besides indexes method, it is possible to use find and count method with all patterns syntaxes which find the first occurence of the pattern and count all occurences of the pattern respectively. Restricted to fixed and wildcard search it is possible to find the last occurence of a pattern by using rfind method.
Indexes, find and rfind also support two optional arguments: start and end corresponding to the start offset and the maximum offset to search pattern in the vfile. Count also supports another arguments which specified the maximum occurence of pattern to find.
