Filtering 101

From Digital Forensics Framework

Jump to: navigation, search

Contents

Overview

Since DFF 1.2, new powerful features have been released: finding and filtering items (nodes in DFF jargon) from simple queries to very precise ones. Of course, these functionnalities are available through User Interfaces (both graphical and console) but also directly from DFF's API. When a search is launched, matching nodes are directly provided : you don't need to wait the end of the processing to be able to start working !

Now, let's have a brief overview concerning syntax:

  • classical comparison operators >, >=, <, <=, !=, ==
  • logical operators support: AND / OR / NOT
  • parentheses and operators precedence
  • 4 pattern syntaxes when looking for strings: fixed, wildcard, regexp, approximate (use of TRE library)
  • strings, numeric (decimal and hexadecimal), boolean and datetime (ISO 8601) types are managed
  • kind of pythonic list: (size in [0x200, 1024, 4096, 8192]) is an equivalent to (size == 512 or [...] size == 8192)

Concerning filtering abilities, the content of nodes can be filtered but also all their attributes. It means that attributes generated by modules can be used to filter (for example exif metadata with metaexif module, emails attributes provided by mailbox module (pff), ...).

The grammar already provides some generic keywords for the most common attributes and interprets other attributes if provided between double quotes as shown in the following examples:

  • all times attributes can be tested at once (time == 2011-09-15T00:00:00)
  • but also independently such as ("MFT altered time" >= 2011-09-15T13:37:00 and "ntfs.$STANDARD_INFORMATION.MFT altered time" <= 2011-09-15T23:42:00). Note that it's possible to use either relative or absolute attributes name. Be aware that using relative attributes is slower than using absolute and the first encountered one will be used.
  • looking for specific word in content: (data == fz("password", i)) to find the word password in a case insensitive and approximate way (if "pasword" is encountered, it is considered as a match)
  • mime types (based on libmagic): mime == "image" or mime in ["image/jpeg", "image/png"]
  • node type: deleted == true, file == true

Then you can mix all of these filters to find all nodes being of Portable Executable file format without .exe and .dll extension for example:

name != w("*.exe", i) and name != w("*.dll", i) and "type.mime" == "PE32"

For those interested in what goes under the hood, Filter objects use Abstract Syntax Tree (AST) which reflects the provided query. The AST is created from a parser generated by the conjoint use of Flex and Bison. Flex is a lexical analyzer which tokenises input strings and provides Bison with tokens. Bison is a parser generator which is in charge of reading sequences of tokens (those provided by Flex) and deciding whether the sequence conforms to the syntax specified by the grammar. Mixing Flex and Bison enables to quickly develop powerful and strong parsers but also eases the extension of a syntax grammar.

During the parsing of the query the AST is created with different types of nodes reflecting supported expressions. When processing is started, the AST is traversed and each node compiles its arguments if needed (string nodes compile regexp pattern for example). Finally AST traversal is performed for each provided node of the VFS. Dependending on the logical operators AND / OR, nodes of the AST will be visited differently. Further versions will add a processing cost for each node of the AST in order to speed process by choosing faster evaluation branches.


Filters

The filtering functionalities are presented when using the graphical user interface, the console and the interpreter. The examples are based on a dump coming from M57-Patents Scenario found on http://digital-corpora.org available at https://domex.nps.edu/corp/scenarios/2009-m57/drives/charlie-2009-11-16.aff. The demonstration is based on the raw version on the dump and not the AFF one in order to have faster processing (both for modules and filtering). The aff version is 3 GB and the raw version is 10 GB.

For the rest of this tutorial, it will be assumed that the dump has been loaded and partition and ntfs modules have been applied. If you need further information to know how to add dumps and apply modules, you may have a look at the Quick start guide.

Graphical User Interface

1-display-search-engine.png Clicking on Search.png will display a new widget which is dedicated to search functionalities. Notice that, the current path is taken into account when the search widget is opened as shown in the next figure


2-search-engine-widget.png The search widget is divided in three main areas:
  • The top of the widget corresponds to the root path which will be provided to the search processing. All sub-trees outside of this scope won't be processed.
  • The right side corresponds to a light version of the browser and is used to display matching nodes. Right-click displays the same menu as in base browser.
  • The left side is divided in two tabs. The main (and default) one is used to configure queries provided to the search engine and displays information such as progression and count of matching nodes. The other tab displays attributes when a node is selected on the right.


3-wildcard-search.png WIP


4-export-exe.png
WIP


5-exported-items.png
WIP


6-mime-search.png
WIP


7-images-search.png
WIP


8-images-searching.png
WIP


8-images-searching-2.png
WIP


8-images-searching-3.png
WIP


9-text-and-passwod.png
WIP


10-text-and-mails.png
WIP


11-extended-attributes.png
WIP


12-extended-attributes-enum.png
WIP


13-ultimate-search.png
WIP

Console

A module named "find" has been developed to provide finding / filtering functionalities in the console. It supports three required and three optional arguments:

  • required:
    • filter_name: corresponds to the name of the created filter. It is used by "save_results" argument.
    • expression: corresponds to the query to process.
    • root_node: corresponds to the starting point of processing
  • optional:
    • recursive: if enabled, it will recursively search nodes from the provided "root_node".
    • save_result: this option enables to automatically saves matching nodes by creating links under the folder /Search items/filter_name.
    • verbose: if activated each matching nodes will be displayed in the console.

The following example will look for all nodes of type Portable Executable with the extension .exe and creates links to matching nodes under the folder /Searched items/PE32 with exe extension:

find / --filter_name PE32\ with\ exe\ extension --expression "type.magic"\ ==\ re("pe32",\ i)\ and\ name\ ==\ w("*.exe",\ i) --recursive --save_result

Depending on the computer and the query, it can take some time to process. Nevertheless, it is possible to move the job in background by using the Ctrl+z combination. This permits to use console while the module 'find' is processing. Using the 'jobs' command provide state of modules. For example:


[3] background find
dff / > jobs

result: pid     name    state   info
[0]     local   finish
[1]     partition       finish
[2]     ntfs    finish  Done
[3]     find    exec    2 % (matching node: 23)

dff / >

Once terminated, the result of the module will be displayed in the console and links become available:

dff / >
total nodes: 28175 - 0x6e0f
total of matching nodes: 893 - 0x37d
dff / > fileinfo Searched\ items/PE32\ with\ exe\ extension/
result: name :          ntoskrnl.exe
node type :             file

generated by:           ntfs
size:                   2189184
attributes:

        ntfs
                $DATA
                        Header
                                Attribute number: 3 - 0x03
                                Compression unit size: 0 - 0x00
                                Content actual size: 2189184 - 0x216780
                                Content allocated size: 2191360 - 0x217000
                                Content initialized size: 2189184 - 0x216780
                                Ending VCN: 534 - 0x216
[...]
                MFT entry number: 14885 - 0x3a25
                MFT physical offset: 3236467712 - 0xc0e89400
                accessed: 2009-11-11 22:06:41
                altered: 2009-02-08 03:35:26
                creation: 2009-02-08 03:35:26
        type
                magic: PE32 executable (native) Intel 80386, for MS Windows
                magic mime: application/x-dosexec; charset=binary

Graphical Interpreter

In this section, the use of Filter and Search objects is directly within the Python interpreter is described. As a reminder, it is available from the graphical user interface of DFF by clicking on Open python shell.png.

  • Filter in less than 10 lines of Python
Python Interpreter
>>> v = vfs()
>>> root = v.getnode('/')
>>> f = Filter("jpeg")
>>> f.compile('mime == "jpeg"')
>>> f.process(root)
>>> nodes = f.matchedNodes()
>>> print len(nodes)
>>> for node in nodes:
...    print node.absolute(), node.attributesByName("type.magic mime", ABSOLUTE_ATTR_NAME)

At first, it is necessary to create a vfs instance (line 1) in order to be able to have access to created nodes (line 2). Search will start from "/" node, the root of the VFS. Then a Filter object is instantiated with a filter name (line 3). Once the filter object is created, it is possible to compile a query (line 4). If the provided query is not well formatted, an exception will be thrown (RuntimeError). If the compilation of the query is successful, the filter object is ready to process nodes (line 5). Depending on the computer and the query, the process can take some time. Once processing is finished, matching nodes can be obtained (line 6) and it's possible to loop over them to display their absolute name (path + name) and one of their attributes (line 7) for example.

Search

Filter object relies on the search API, which can be used with DFF virtual files too. The following example presents a way to carve JPEG headers in the first partition.

>>> node = v.getnode("/Logical files/charlie-2009-11-16.raw/partition/Partition 1")
>>> vfile = node.open()
>>> headers = Search()
>>> headers.setPatternSyntax(Search.Fixed)
>>> headers.setPattern("\xff\xd8\xff")
>>> headers.setCaseSensitivity(Search.CaseSensitive)
>>> headers.compile()
>>> hoffsets = vfile.indexes(headers)
>>> print len(hoffsets)
7516
>>> for hoffset in hoffsets:
...    print hex(hoffset)

At first, the "Partition 1" node is obtained (line 1) and is then opened which returns a vfile object (line 2). DFF vfile objects provide the same methods than Python File Object but also gives access to search methods. Before using these methods, it is necessary to create a Search object (line 3). The search object is then configured by setting its pattern syntax (line 4), here setted to Fixed, then the pattern (line 5), corresponding to JPEG header in Python hexadecimal string and finally, the case sensitivity (line 6) which, in this case, is not really useful and is only used to present the functionality.

The search object is now configured and has to be compiled (line 7). If there's an error, an exception will be thrown. The search object is ready to use and can be provided to vfile searching methods. Here, the "indexes" method is used (line 8) which returns all offsets in the file where the pattern has matched. As ever, depending on the computer and the pattern, it can take some time. Once search is finished, offsets are available as a list. Number of matching patterns can be obtained with classical Python functions (line 9) and the list can be iterated to do further process (line 10).

Besides indexes method, it is possible to use find and count method with all patterns syntaxes which find the first occurence of the pattern and count all occurences of the pattern respectively. Restricted to fixed and wildcard search it is possible to find the last occurence of a pattern by using rfind method.

Indexes, find and rfind also support two optional arguments: start and end corresponding to the start offset and the maximum offset to search pattern in the vfile. Count also supports another arguments which specified the maximum occurence of pattern to find.