ARCHIVE Reference Guide

The TICS analysis scope is — among other things — determined by a so-called ARCHIVE file. (Other contributing factors are the specified file extensions for languages and build types specified in the SERVER.yaml). The ARCHIVE file is optional and describes a FILEFILTER on a global or project level that contains a description what files in the archive should be processed by TICS. This prevents unwanted files to be scanned and can provide considerable performance improvements by skipping deep directory structures that are known not to contain any relevant files.

To use an ARCHIVE file first create a .txt file in the configuration directory using the syntax described below. The name of the ARCHIVE file has to be specified in either the SERVER.yaml to apply the filters globally or the PROJECTS.yaml to apply the filters to a specific project.

Syntax of the ARCHIVE file

The file can contain the following entries (each at most once):

  1. 'FILE' => expr
  2. 'DIR' => expr
  3. 'TESTCODE_FILE' => expr
  4. 'TESTCODE_DIR' => expr
  5. 'EXTERNAL_FILE' => expr
  6. 'EXTERNAL_DIR' => expr
expr ::=
    expr || expr
  | expr && expr
  | !expr
  | (expr)
  | "regexp"

regexp ::= a regular expression

Two styles of comments are allowed, so one can add remarks to the expressions:

Allowed regular expressions

The flavor of regular expressions allowed is the Perl Regular Expressions (or Perl Compatible Regular Expressions — PCRE). The most important bits are summarized below.

Meta characters outside character class
char meaning
\ escape or special
^ start of string
$ end of string
. any character
* match zero or more times (longest match)
+ match one or more times (longest match)
? match zero or one times
*? match zero or more times (shortest match)
+? match one or more times (shortest match)
| alternative
[] character class
Meta characters inside character class
char meaning
\ escape or special
^ negate the class (only if the first character)
- range of characters
Special notations
special meaning
\b word boundary (zero-width assertion)
\B non-word boundary (zero-width assertion)
\w word character
\W non-word character
\d digit
\D non-digit
\s whitespace
\S non-whitespace

Examples follow below.

The 'FILE' versus 'DIR' archive expressions

All 'FILE' and 'DIR' expressions are optional. When omitted, any path is accepted.

Use 'FILE' to declare boolean predicates that describe whether a file path is accepted (predicate returns true) or rejected (predicate returns false).

The 'FILE' expression is used to filter files returned by the transitive file search from a given project. Only those files that match the expression are kept. All other files are discarded.

Use 'DIR' to prune the search space. This is useful to speed up the traversal in large file/directory trees. The predicates expressed here are evaluated during directory traversal. Only directories matching the predicate are included in the search. Directories that do not match the predicate are not traversed any further. Therefore, no files in such directory subtrees are included.

The 'DIR' expression can be used to restrict the search space during collection on large directory structures. This can be useful in case of very large archives — where file system traversal is very costly — to speed up the process. The 'DIR' expression should only be used to improve the performance.

For example, although one can prevent TICS from analyzing files in '.svn' directories by using 'FILE' => !"/\.svn/", this 'FILE' expression does not prevent TICS from scanning all files and folders in such directories transitively, which can be costly depending on the number of files in the directory structure and the file system's performance. The 'DIR' expression is used to improve performance.

An important difference between the 'DIR' and 'FILE' archive expressions is that the 'DIR' expression is evaluated for every (intermediate) directory during directory traversal, whereas the 'FILE' expression is evaluated on the complete file path.

Use 'FILE' and 'DIR' to set the scope of the project. Use 'TESTCODE_FILE' and 'EXTERNAL_FILE' to label the Code Type within the scope set by 'FILE' and 'DIR' (see below). Avoid using 'TESTCODE_DIR' and 'EXTERNAL_DIR' since these do not work well as labelling mechanisms.

The 'TESTCODE_FILE' and 'TESTCODE_DIR' archive expressions

It is possible to distinguish between production code and test code. Use 'TESTCODE_FILE' and 'TESTCODE_DIR' for this. Normally, one may want to exclude test code and concentrate on production code. Using 'TESTCODE_FILE' and 'TESTCODE_DIR' makes it possible to additionally include and analyze test code. In the viewer, it is supported to distinguish between these code types or view these together.

'TESTCODE_FILE' =>
  "/java/[^/]+/test/"

'TESTCODE_DIR' =>

Advice: Only use 'TESTCODE_FILE' to specify Test Code. The search scope is already limited by the general 'DIR' predicate. 'TESTCODE_DIR' is preserved for compatibility reasons.

The 'EXTERNAL_FILE' and 'EXTERNAL_DIR' archive expressions

The exclusion of files through the ARCHIVE file may have some unintended consequences. When a project contains header files that define an external API, these header files are typically only included and used by test code and not by the actual source code. Normally, one does not want to analyze test code with production code. TICS offers two possible solutions: 'EXTERNAL_FILE'/'EXTERNAL_DIR' and 'TESTCODE_FILE'/'TESTCODE_DIR' (see above). Choose one of these solutions to avoid these 'external' header files becoming unbuildable (and, therefore, become 100% dead code).

The difference between Test Code and External is that the former is still interesting to show in the viewer (where it is presented as an additional Code Type), whereas the latter has no useful properties to show in the viewer.

To include test code to be able to test (API) header files but not analyze such files themselves, use 'EXTERNAL_FILE' or 'EXTERNAL_DIR' expressions. This ensures these files are accounted for while pre-processing and establishing build relations, but will not be analyzed themselves.

Advice: Only use 'EXTERNAL_FILE' to specify External Code. The search scope is already limited by the general 'DIR' predicate. 'EXTERNAL_DIR' is preserved for compatibility reasons.

Adding code through the 'EXTERNAL_FILE' and 'EXTERNAL_DIR' expressions only affects the Dead Code metric. This metric is affected in the following ways:

Matching file names

Each collected file is transformed in so-called canonical form before it is matched to the archive expression. This is an OS-independent and unique representation of a file name. One of the characteristics of this format is that directory separators are denoted by '/'. Furthermore, on OS-es with a case insensitive file system (such as the Win32 file systems), the file name is put in the case that is internally used by the file system.

When creating archive expressions, bear in mind to use '/' as directory separator. For matching, the canonical name of a collected file is used. On case insensitive file systems (e.g., Windows) the match is performed case insensitively. Otherwise, the match is case sensitive. For Windows, this allows file extensions to be specified by one clause instead of several. For example, to match C/C++ header files one can now suffice with a '"\.h$"' clause, instead of '"\.h$" || "\.H$"' to match any (accidental) header files with extension '.H'. (Note that the latter could have been simplified to '"\.[hH]$"', but this is still cumbersome. To match the '.java' extension one would have to write '"\.[jJ][aA][vV][aA]$"' to capture all posibilities.)

Filtering

The 'FILE' expression is a logical expression consisting of disjunction, conjunction, negation and strings of regular expressions that are applied to collected files. The result is a boolean value, determining whether the file should be analyzed by TICS or not. Grouping is used to influence the operator precedence (|| binds weaker than && which, binds weaker than !).

Example

A typical expression specifies the subset of files that should be accepted and a list of exceptions on this subset that should be disallowed.

'FILE' =>
  // allowed file extensions: note the '||'s and '(', ')'
  ( "\.h$" || "\.c$" ) &&
  !"_i\.c$" &&             # ignore certain C files
  !"_p\.c$" &&
  !"/test/"                # ignore files in test directories

This example shows a disjunction two file extensions followed by three negated clauses that restrict the result set. In this case, some file suffixes and a directory name are excluded by negated patterns. Note the '\.' and '$'. The \. is required to specify a literal dot ('.'), since in regular expressions, . means match any character. For example, ".c$" would also match test.cc which looks like a C++ file instead of a C file. The $ is required to specifically match only at the end of a file name. Otherwise, the expression could match anywhere in a file name. For example, "\.c" would also match test.cpp which looks like a C++ file instead of a C file. Also note the '/' delimiters that are used to delimit the 'test' directory name. This is the easiest way to specify that a file name should match the whole directory name. Regular expression '/test' matches all directories (and files) that start with 'test'. Regular expression 'test/' matches all directories (files) that end with 'test'. Note that directory separators are denoted by '/' regardless of the OS used. The expression above would match the following files:

test.h
test.c
notest/test.h
notest/test.c

and it would reject:

test_i.c
test_p.c
test/test.h
test/test.c

Optimizing the file collection

'DIR' is used to efficiently search through a directory structure. If a directory is set to be excluded by a 'DIR' expression, the search path will not recursively go into that directory. This is especially useful when the file system contains large directories of files not needed to be stored in the database.

Note that 'FILE' and 'DIR' expressions are evaluated against the root of the branch directory. This makes it possible to match the start of a file path with '^'.

For example, assume branch directory /var/lib/jenkins and file /var/lib/jenkins/a/b/c.c. It is possible to filter on "^/a/b/c\.c$" since the /var/lib/jenkins is not taken into consideration. The following "^a/b/c\.c$" and "/a/b/c\.c$" would also work. With regard to the former, note that a / at the start of the branch is implied. This makes it easier to denote file paths. The latter expression is more liberal since it would also match /var/lib/jenkins/d/a/b/c.c; note the extra intermediate /d/ directory.

The 'DIR' expression is applied to the search's current root path. This is not extremely relevant for simple negated regular expressions, but for compound expressions specifying the exclusion of a certain directory's subdirectory, care must be taken not to exclude the directory altogether.

In the following example, path2 is excluded from file search:

'DIR' => !"/path2/"

Directory path2 and its subdirectories are not scanned for files. Note the '/' to avoid matching directories that start with path2, e.g., path2b.

In the following example, only path1/path2 is included:

'DIR' => "/path1/path2/"

This is equivalent to the following (longer) expression:

'DIR' => "/path1/$" || "/path1/path2/"

To only accept a certain root path, all prefixes of that root path must also match. To accept only path1/path2 and its subdirectories, specify:

'DIR' => "^/path1/path2/"

This is equivalent to the following (longer) expression:

'DIR' => "^/path1/$" || "^/path1/path2/"

The '^' operator matches a path from the root of the branch. So, given branch branch and path branch/path1/path2, the expression above matches, but branch/path0/path1/path2 does not.

'DIR' => "^/path0/$" || "/path1/$" || "/path1/path2/"

The expression above, matches branch/path0/path1/path2, but also branch/path1/path2.

This is equivalent to the following (shorter) expression:

'DIR' => "^/path0/$" || "/path1/path2/"

The difference between regular expressions starting with '^' or '/' is probably best explained by some examples.

Examples

Assume the following (Windows) path exists on the file system for some project: C:\branch\path0\path1\path2. Furthermore, assume the branch starts in C:\branch. Below, some archive expressions and their results are given.

The following expressions match C:\branch\path0:

'DIR' => "^path0/"
'DIR' => "^/path0/"
'DIR' => "/path0/"

The following expressions do not match C:\branch\path0:

'DIR' => !"/path0/"
'DIR' => "^/path1/"
'DIR' => !"/path0/" && !"/path1/"

The following expressions match both C:\branch\path0 and C:\branch\path1:

'DIR' => "/path0/" || "/path1/"
'DIR' => "^/path0/$" || "^/path1/"
'DIR' => "^path0/$" || "^path1/"

The following expressions match both C:\branch\path0 and C:\branch\path0\path1:

'DIR' => "^/path0/$" || "/path1/"
'DIR' => "^/path0/path1/"
'DIR' => "/path0/$" || "/path1/"
'DIR' => "/path0/path1/"

The following expressions match C:\branch\path0 but not C:\branch\path0\path1. It would match C:\branch\path1 if such a path existed:

'DIR' => "^path0/$" || "^path1/$"
'DIR' => "^/path0/$" || "^/path1/$"
'DIR' => "/path0/$" || "^/path1/$"

The following expressions match C:\branch\path0\path1\path2:

'DIR' => "/path0/$" || "/path1/$" || "/path2/"
'DIR' => "/path0/path1/path2/"
'DIR' => "^/path0/path1/path2/"
'DIR' => "^/path0/$" || "^/path0/path1/$" || "^/path0/path1/path2/"

The following expressions do not match C:\branch\path0\path1\path2 (the collector does not get deep enough into the directory structure):

'DIR' => "/path0/path1/$"   # fails at C:\branch\path0\path1\path2
                            # due to $
'DIR' => "^/path1/path2/$"  # fails at C:\branch\path0
                            # path1 does not match at the branch root
'DIR' => "/path1/path2/$"   # fails at C:\branch\path0
                            # TICS does not know about path0
'DIR' => "/path2/$"         # fails at C:\branch\path0
                            # TICS does not know about path0

More Examples

Consider the following archive on disk.

.
|-- .git
|-- inc
|   |-- h1.h
|   |-- h2.h
|   `-- h3.h
|-- res
|   |-- r1.rc
|   |-- r2.rc
|   `-- r3.rc
|-- src
|   |-- s1.c
|   |-- s2.c
|   `-- s3.c
`-- tst
    |-- t1.c
    |-- t2.c
    `-- t3.c

First, we exclude folders .git and res.

'DIR' =>
  !"/\.git/" &&
  !"/res/"

Result:

.
|-- inc
|   |-- h1.h
|   |-- h2.h
|   `-- h3.h
|-- src
|   |-- s1.c
|   |-- s2.c
|   `-- s3.c
`-- tst
    |-- t1.c
    |-- t2.c
    `-- t3.c

Add Test Code.

'TESTCODE_FILE' =>
  "/tst/"

This gives two code types: Production and Test code:

Production:

.
|-- inc
|   |-- h1.h
|   |-- h2.h
|   `-- h3.h
`-- src
    |-- s1.c
    |-- s2.c
    `-- s3.c

Test:

.
`-- tst
    |-- t1.c
    |-- t2.c
    `-- t3.c

Suppose that, for some reason, all files with suffix 3 must be excluded.

'FILE' =>
  !"3\."

Production:

.
|-- inc
|   |-- h1.h
|   `-- h2.h
`-- src
    |-- s1.c
    `-- s2.c

Test:

.
`-- tst
    |-- t1.c
    `-- t2.c

Resulting in the following ARCHIVE:

'FILE' =>
  !"3\."

'DIR' =>
  !"/\.git/" &&
  !"/res/"

'TESTCODE_FILE' =>
  "/tst/"

Even More Examples

Exclude all variations of the jquery library.

'FILE' =>
  !"[/._-]jquery.*\.js$"

This matches (and excludes) the following files (among others):

jquery.tmpl.min.js
jquery.ui.core.js
adblock-jquery.js
nwmatcher-jquery.js
tree.jquery.js
vendor_jquery.js

Exclude certain subdirectories on a certain level.

'DIR' =>
  !"/java/[^/]+/build/" &&
  !"/java/[^/]+/classes/" &&
  !"/java/[^/]+/reports/"

This matches (and excludes) the following folders (among others):

java/CucumberTests/build/
java/Logging/build/
java/SeleniumModels/build/

Exclude all Python code.

'FILE' =>
  !"\.py$"

Exclude certain non-source directories.

'DIR' =>
  !"/stubs/" &&
  !"/data/" &&
  !"^/components/idevs7/resources/" &&
  !"^/deploy/" &&
  !"^/make/"

Determining the root of a branch

To successfully use the '^' operator, one must know where the root of a branch starts. Each ARCHIVE file is referenced by a project or a branch in the PROJECTS.yaml. These branch directories can be queried for existing projects via the TICSMaintenance tool as follows:

TICSMaintenance -project project -info

Look for "dir" in the output.

      "branches" : [
         {
            "baselines" : [],
            "calculate" : 1,
            "dir" : "/var/lib/jenkins",    # <- HERE
            "id" : "1",
            "name" : "main",
            "visible" : 1
         }
      ],