The contents of a TICS quality database are determined by a so-called ARCHIVE file and the extensions and build types specified in the SERVER.txt. The ARCHIVE file is optional and describes a FILEFILTER on a global or project level which contains what files in the archive should be processed by TICS. This prevents unwanted files to be scanned and can provide considerable performance improvements by skipping deep directory structures that are known not to contain any relevant files.
To use an ARCHIVE file first create a .txt file in the configuration directory using the syntax described below. The name of the ARCHIVE file has to be specified in either the SERVER.txt to apply the filters globally or the PROJECTS.txt to apply the filters to a specific project.
The file can contain the following entries (each at most once):
'FILE' => expr
'DIR' => expr
'EXTERNAL_FILE' => expr
'EXTERNAL_DIR' => expr
'TESTCODE_FILE' => expr
'TESTCODE_DIR' => expr
expr ::= expr || expr | expr && expr | !expr | (expr) | "regexp" regexp ::= a regular expression
Two styles of comments are allowed, so one can add remarks to the expressions:
//
.#
and continuing up to the end of the
line. Note that end of line comments can be used as single
line comments but not vice versa (this would lead to a TICS runtime
error). See the example below (comments are shown in bold for emphasis).
The names 'DIR'
and 'FILE'
refer to the fact that
that 'DIR'
expressions are only applied to paths that denote a
directory on the file system, and the 'FILE'
expressions
are only applied to paths that denote a file on the file system.
The most important difference between the 'DIR'
and
'FILE'
archive expressions is that the 'DIR'
expression is evaluated for every directory during file collection,
whereas the 'FILE'
expression is evaluated after all
files have been collected.
The (optional) 'DIR'
expression can be used to restrict the
search space during collection on large directory structures. This can be
useful in case of very large archives where file system traversal is very
costly to speed up the process. The collection phase only returns
files (no directories). The 'DIR'
expression should only
be used to improve the performance.
The (optional) 'FILE'
expression is used to filter all
files returned by the recursive file search from a given project.
Only those files that match the expression are kept. All other files are
discarded.
The exclusion of files through the ARCHIVE file may have some unintended
consequences. When a project contains header files that define an external API,
these header files are typically only included and used by test code and not by
the actual source code. Normally, one does not want to analyze test code with
production code. TICS offers two possible solutions:
'EXTERNAL_FILE'
/'EXTERNAL_DIR'
and
'TESTCODE_FILE'
/'TESTCODE_DIR'
(see below).
Choose one of these solutions to avoid these 'external' header files becoming
unbuildable (and, therefore, become 100% dead code).
To include test code to be able to test (API) header files but not
analyze such files themselves, use
'EXTERNAL_FILE'
or 'EXTERNAL_DIR'
expressions. This
ensures these files are accounted for while pre-processing and establishing
build relations, but will not be analyzed themselves.
Adding code through the 'EXTERNAL_FILE'
and
'EXTERNAL_DIR'
expressions only affects the Dead Code metric. This metric is
affected in the following ways:
It is possible to distinguish between production code and
test code. Use 'TESTCODE_FILE'
and
'TESTCODE_DIR'
for this. Normally, you may want to exclude test
code and concentrate on production code. Using 'TESTCODE_FILE'
and 'TESTCODE_DIR'
makes it possible to additionally include and
analyze test code. In the viewer, it is supported to distinguish
between these code types or view these together.
'TESTCODE_FILE' => "\.java$" 'TESTCODE_DIR' => "/java/$" || "/java/[^/]+/$" || "/java/[^/]+/test/"
Each collected file is transformed in so-called canonical form before
it is matched to the archive expression. This is an OS-independent and unique
representation of a file name. One of the characteristics of this format is
that directory separators are denoted by '/
'. Furthermore, on OS-es
with a case insensitive file system (such as the Win32 file systems), the file
name is put in the case that is internally used by the file system.
When creating archive expressions, bear in mind to use the '/
' as
directory separator. For matching, the canonical name of a collected file is
used. On case insensitive file systems (e.g., Windows) the match is performed
case insensitively. Otherwise, the match is case sensitive. For Windows, this
allows file extensions to be specified by one clause instead of several. For
example, to match C/C++ header files one can now suffice with a
'"\.h$"
' clause, instead of '"\.h$" || "\.H$"
' to
match any (accidental) header files with extension '.H
'. (Note
that the latter could have been simplified to '"\.[hH]$"
', but
this is still cumbersome. To match the '.java
' extension one
would have to write '"\.[jJ][aA][vV][aA]$"
' to capture all
posibilities.)
The 'FILE'
expression is a logical expression consisting of
disjunction, conjunction, negation and strings of regular expressions
that are applied to collected files. The result is a boolean value,
determining whether the file should be entered into the database or not.
Grouping is used to influence the operator precedence
(||
binds weaker than &&
which, binds weaker
than !
).
A typical expression specifies the subset of files that should be accepted and a list of exceptions on this subset that should be disallowed.
'FILE' => // allowed file extensions: note the '||'s and '(', ')' ( "\.h$" || "\.c$" ) && !"_i\.c$" && # ignore certain C files !"_p\.c$" && !"/test/" # ignore files in test directories
This example shows a disjunction two file extensions followed by three
negated clauses that restrict the result set. In this case, some file
suffixes and a directory name are excluded by negated patterns. Note the
'\.
' and '$
'. The \.
is required
to specify a literal dot ('.
'), since in regular
expressions, .
means match any character. For
example, ".c$"
would also match test.cc
which
looks like a C++ file instead of a C file. The $
is
required to specifically match only at the end of a file name.
Otherwise, the expression could match anywhere in a file name. For
example, "\.c"
would also match test.cpp
which
looks like a C++ file instead of a C file. Also note the
'/
' delimiters that are used to delimit the
'test
' directory name. This is the easiest way to specify
that a file name should match the whole directory name. Regular
expression '/test
' matches all directories (and files) that
start with 'test
'. Regular expression 'test/
'
matches all directories (files) that end with 'test
'. Note
that directory separators are denoted by '/
' regardless of
the OS used. The expression above would match the following files:
test.h test.c notest/test.h notest/test.c
and it would reject:
test_i.c test_p.c test/test.h test/test.c
'DIR'
is used to efficiently search through a directory
structure. If a directory is set to be excluded by a 'DIR'
expression, the search path will not recursively go into that directory.
This is especially usefull when the file system contains large directories
of files not needed to be stored in the database.
Note that the 'DIR'
is relative to the branch directory. In
other words, the 'DIR'
elements are prefixed by the branch
directory.
The archive expression evaluator has two modes of operation, called
legacy and new. When evaluating in legacy mode, regular
expressions starting with '/
' match the root of a branch. When
evaluating in new mode, regular expressions starting with '/
' do
not match the root of a branch. Instead, to match the root of a branch,
use '^
'. The evaluator automatically uses the new mode if there is
at least one regular expression starting with '^
'.
The 'DIR'
expression is applied to the search's current
root path. This is not extremely relevant for simple negated
regular expressions, but for compound expressions specifying the
exclusion of a certain directory's subdirectory, care must be taken not
to exclude the directory altogether.
In the following example, path2
is excluded from file search:
'DIR' => !"/path2/"
Directory path2
and its subdirectories are not scanned for
files. Note the '/
' to avoid matching directories that start
with path2
, e.g., path2b
.
When using '$
' to match the end of a directory, be sure to
precede it by a '/
'. (For the evaluator, directory names are always
suffixed with a '/
'.)
In the following example, all subdirectories of path1
are
excluded except path1/path2
:.
'DIR' => "/path1/$" || "/path1/path2/"
To only accept a certain root path, specify all prefixes of that root
path as follows. To accept only path1/path2
and its
subdirectories, specify:
'DIR' => "^/$" || "^path1/$" || "^path1/path2/"
Here, "^/$"
matches the branch root.
The '^
' operator matches a path from the root of the branch.
So, for branch/path1/path2
, the expression above matches, but
for branch/path0/path1/path2
it does not.
'DIR' => "^path0/$" || "/path1/$" || "/path1/path2/"
Conversely, the above statement, matches branch/path0/path1/path2
, but not
branch/path1/path2
(since after removing the branch, the
expression does no longer start with a '/
').
The difference between regular expressions starting with '^
' or
'/
' is probably best explained by some examples.
Assume the following (Windows) path exists on the file system for some project:
C:\branch\path0\path1\path2
. Furthermore, assume the branch starts
in C:\branch
. Below, some archive expressions and their results are
given.
The following expression matches C:\branch\path0
(new mode):
'DIR' => "^path0/$"
The following expression matches C:\branch\path0
(legacy mode):
'DIR' => "/path0/$"
The following expression does not match C:\branch\path0
(new mode):
'DIR' => "/path0/$" || "^path1/$"
The following expression matches both C:\branch\path0
and C:\branch\path0\path1
(legacy mode):
'DIR' => "/path0/$" || "/path1/$"
The following expression matches both C:\branch\path0
and C:\branch\path0\path1
(new mode):
'DIR' => "^path0/$" || "/path1/$"
The following expression matches C:\branch\path0
but not
C:\branch\path0\path1
(new mode). It would match
C:\branch\path1
if such a path existed:
'DIR' => "^path0/$" || "^path1/$"
The following expression matches C:\branch\path0\path1\path2
(legacy mode):
'DIR' => "/path0/$" || "/path1/$" || "/path2/$"
The following expression matches C:\branch\path0\path1\path2
(new mode):
'DIR' => "^path0/$" || "/path1/$" || "/path2/$"
The following expressions all match C:\branch\path0\path1\path2
(some matches are more strict
than others):
'DIR' => "/path0/$" || "/path0/path1/$" || "/path0/path1/path2/$" # legacy 'DIR' => "^path0/$" || "^path0/path1/$" || "^path0/path1/path2/$" # new 'DIR' => "^path0/$" || "^path0/path1/$" || "/path1/path2/$" 'DIR' => "^path0/$" || "/path1/" || "/path2/$"
The following expressions would not match C:\branch\path0\path1\path2
(the collector would
not get deep enough in the directory structure):
'DIR' => "/path0/path1/$" || "/path0/path1/path2/$" # fails at C:\branch\path0 'DIR' => "^path0/path1/path2/$" # fails at C:\branch\path0 'DIR' => "^path0/$" || "/path1/path2/$" # fails at C:\branch\path0\path1 'DIR' => "/path2/$" # fails at C:\branch\path0
Formally, for 'DIR'
expressions to successfully match a root
path, the archive expression must match also all prefixes of that root
path (including the branch root "^/$"
).
For negated expressions this is automatically the case (if a string does not
contain a certain pattern, substrings will not match this pattern either --
barring patterns ending with '$
').
To successfully use the '^
' operator, one must know where the root
of a branch starts. Each ARCHIVE file is referenced by a project or a branch in
the SERVER.txt
. Theses branch directory can be queried for existing
projects via the TICSMaintenance tool as follows:
TICSMaintenance -project project -listbranches