ARCHIVE Reference Guide

The contents of a TICS quality database are determined by a so-called ARCHIVE file and the extensions and build types specified in the SERVER.txt. The ARCHIVE file is optional and describes a FILEFILTER on a global or project level which contains what files in the archive should be processed by TICS. This prevents unwanted files to be scanned and can provide considerable performance improvements by skipping deep directory structures that are known not to contain any relevant files.

To use an ARCHIVE file first create a .txt file in the configuration directory using the syntax described below. The name of the ARCHIVE file has to be specified in either the SERVER.txt to apply the filters globally or the PROJECTS.txt to apply the filters to a specific project.

Syntax of the ARCHIVE file

The file contains at most four entries:

  1. 'FILE' => expr
  2. 'DIR' => expr
  3. 'EXTERNAL_FILE' => expr
  4. 'EXTERNAL_DIR' => expr
expr ::=
    expr || expr
  | expr && expr
  | !expr
  | (expr)
  | "regexp"

regexp ::= a regular expression

Two styles of comments are allowed, so one can add remarks to the expressions:

The 'FILE' versus 'DIR' archive expressions

The names 'DIR' and 'FILE' refer to the fact that that 'DIR' expressions are only applied to paths that denote a directory on the file system, and the 'FILE' expressions are only applied to paths that denote a file on the file system.

The most important difference between the 'DIR' and 'FILE' archive expressions is that the 'DIR' expression is evaluated for every directory during file collection, whereas the 'FILE' expression is evaluated after all files have been collected.

The (optional) 'DIR' expression can be used to restrict the search space during collection on large directory structures. This can be useful in case of very large archives where file system traversal is very costly to speed up the process. The collection phase only returns files (no directories). The 'DIR' expression should only be used to improve the performance.

The (optional) 'FILE' expression is used to filter all files returned by the recursive file search from a given project. Only those files that match the expression are kept. All other files are discarded.

The 'EXTERNAL_FILE' and 'EXTERNAL_DIR' archive expressions

The exclusion of files through the ARCHIVE file may have some unintended consequences. When a project contains header files that define an external API, these header files are typically only included and used by test code and not by the actual source code. Since it is best practice to exclude test code from TICS analysis via the archive file filter, these 'external' header files become unbuildable and are therefore be 100% dead code. This would not have happened if the test code would have been included in the TICS scope.

To mitigate this issue, one can add this code through an 'EXTERNAL-FILE' or 'EXTERNAL-DIR' expressions. This ensures these files are accounted for while pre-processing and establishing build relations, but will not be analysed themselves.

Adding code through the 'EXTERNAL-FILE' and 'EXTERNAL-DIR' expressions only affects the Dead Code metric. This metric is affected in the following ways:

Matching file names

Each collected file is transformed in so-called canonical form before it is matched to the archive expression. This is an OS-independent and unique representation of a file name. One of the characteristics of this format is that directory separators are denoted by '/'. Furthermore, on OS-es with a case insensitive file system (such as the Win32 file systems), the file name is put in the case that is internally used by the file system.

When creating archive expressions, bear in mind to use the '/' as directory separator. For matching, the canonical name of a collected file is used. On case insensitive file systems (e.g., Windows) the match is performed case insensitively. Otherwise, the match is case sensitive. For Windows, this allows file extensions to be specified by one clause instead of several. For example, to match C/C++ header files one can now suffice with a '"\.h$"' clause, instead of '"\.h$" || "\.H$"' to match any (accidental) header files with extension '.H'. (Note that the latter could have been simplified to '"\.[hH]$"', but this is still cumbersome. To match the '.java' extension one would have to write '"\.[jJ][aA][vV][aA]$"' to capture all posibilities.)

Filtering

The 'FILE' expression is a logical expression consisting of disjunction, conjunction, negation and strings of regular expressions that are applied to collected files. The result is a boolean value, determining whether the file should be entered into the database or not. Grouping is used to influence the operator precedence (|| binds weaker than && which, binds weaker than !).

Example

A typical expression specifies the subset of files that should be accepted and a list of exceptions on this subset that should be disallowed.

'FILE' =>
  // allowed file extensions: note the '||'s and '(', ')'
  ( "\.h$" || "\.c$" ) &&
  !"_i\.c$" &&             # ignore certain C files
  !"_p\.c$" &&
  !"/test/"                # ignore files in test directories

This example shows a disjunction two file extensions followed by three negated clauses that restrict the result set. In this case, some file suffixes and a directory name are excluded by negated patterns. Note the '\.' and '$'. The \. is required to specify a literal dot ('.'), since in regular expressions, . means match any character. For example, ".c$" would also match test.cc which looks like a C++ file instead of a C file. The $ is required to specifically match only at the end of a file name. Otherwise, the expression could match anywhere in a file name. For example, "\.c" would also match test.cpp which looks like a C++ file instead of a C file. Also note the '/' delimiters that are used to delimit the 'test' directory name. This is the easiest way to specify that a file name should match the whole directory name. Regular expression '/test' matches all directories (and files) that start with 'test'. Regular expression 'test/' matches all directories (files) that end with 'test'. Note that directory separators are denoted by '/' regardless of the OS used. The expression above would match the following files:

test.h
test.c
notest/test.h
notest/test.c

and it would reject:

test_i.c
test_p.c
test/test.h
test/test.c

Optimizing the file collection

'DIR' is used to efficiently search through a directory structure. If a directory is set to be excluded by a 'DIR' expression, the search path will not recursively go into that directory. This is especially usefull when the file system contains large directories of files not needed to be stored in the database.

Note that the 'DIR' is relative to the branch directory. In other words, the 'DIR' elements are prefixed by the branch directory.

The archive expression evaluator has two modes of operation, called legacy and new. When evaluating in legacy mode, regular expressions starting with '/' match the root of a branch. When evaluating in new mode, regular expressions starting with '/' do not match the root of a branch. Instead, to match the root of a branch, use '^'. The evaluator automatically uses the new mode if there is at least one regular expression starting with '^'.

The 'DIR' expression is applied to the search's current root path. This is not extremely relevant for simple negated regular expressions, but for compound expressions specifying the exclusion of a certain directory's subdirectory, care must be taken not to exclude the directory altogether.

In the following example, path2 is excluded from file search:

'DIR' => !"/path2/"

Directory path2 and its subdirectories are not scanned for files. Note the '/' to avoid matching directories that start with path2, e.g., path2b.

When using '$' to match the end of a directory, be sure to precede it by a '/'. (For the evaluator, directory names are always suffixed with a '/'.)

In the following example, all subdirectories of path1 are excluded except path1/path2:.

'DIR' => "/path1/$" || "/path1/path2/"

To only accept a certain root path, specify all prefixes of that root path as follows. To accept only path1/path2 and its subdirectories, specify:

'DIR' => "^/$" || "^path1/$" || "^path1/path2/"

Here, "^/$" matches the branch root. The '^' operator matches a path from the root of the branch. So, for branch/path1/path2, the expression above matches, but for branch/path0/path1/path2 it does not.

'DIR' => "^path0/$" || "/path1/$" || "/path1/path2/"

Conversely, the above statement, matches branch/path0/path1/path2, but not branch/path1/path2 (since after removing the branch, the expression does no longer start with a '/').

The difference between regular expressions starting with '^' or '/' is probably best explained by some examples.

Examples

Assume the following (Windows) path exists on the file system for some project: C:\branch\path0\path1\path2. Furthermore, assume the branch starts in C:\branch. Below, some archive expressions and their results are given.

The following expression matches C:\branch\path0 (new mode):

'DIR' => "^path0/$"

The following expression matches C:\branch\path0 (legacy mode):

'DIR' => "/path0/$"

The following expression does not match C:\branch\path0 (new mode):

'DIR' => "/path0/$" || "^path1/$"

The following expression matches both C:\branch\path0 and C:\branch\path0\path1 (legacy mode):

'DIR' => "/path0/$" || "/path1/$"

The following expression matches both C:\branch\path0 and C:\branch\path0\path1 (new mode):

'DIR' => "^path0/$" || "/path1/$"

The following expression matches C:\branch\path0 but not C:\branch\path0\path1 (new mode). It would match C:\branch\path1 if such a path existed:

'DIR' => "^path0/$" || "^path1/$"

The following expression matches C:\branch\path0\path1\path2 (legacy mode):

'DIR' => "/path0/$" || "/path1/$" || "/path2/$"

The following expression matches C:\branch\path0\path1\path2 (new mode):

'DIR' => "^path0/$" || "/path1/$" || "/path2/$"

The following expressions all match C:\branch\path0\path1\path2 (some matches are more strict than others):

'DIR' => "/path0/$" || "/path0/path1/$" || "/path0/path1/path2/$" # legacy
'DIR' => "^path0/$" || "^path0/path1/$" || "^path0/path1/path2/$" # new
'DIR' => "^path0/$" || "^path0/path1/$" || "/path1/path2/$"
'DIR' => "^path0/$" || "/path1/" || "/path2/$"

The following expressions would not match C:\branch\path0\path1\path2 (the collector would not get deep enough in the directory structure):

'DIR' => "/path0/path1/$" || "/path0/path1/path2/$" # fails at C:\branch\path0
'DIR' => "^path0/path1/path2/$"                     # fails at C:\branch\path0
'DIR' => "^path0/$" || "/path1/path2/$"             # fails at C:\branch\path0\path1
'DIR' => "/path2/$"                                 # fails at C:\branch\path0

Formally, for 'DIR' expressions to successfully match a root path, the archive expression must match also all prefixes of that root path (including the branch root "^/$"). For negated expressions this is automatically the case (if a string does not contain a certain pattern, substrings will not match this pattern either -- barring patterns ending with '$').

Determining the root of a branch

To successfully use the '^' operator, one must know where the root of a branch starts. Each ARCHIVE file is referenced by a project or a branch in the SERVER.txt. Theses branch directory can be queried for existing projects via the TICSMaintenance tool as follows:

TICSMaintenance -project project -listbranches