The TICS analysis scope is — among other things — determined by a
so-called ARCHIVE file. (Other contributing factors are the specified
file extensions for languages and build types specified in the
SERVER.yaml
). The ARCHIVE file is optional and describes a FILEFILTER on a global or
project level that contains a description what files in the archive should be
processed by TICS. This prevents unwanted files to be scanned and can provide
considerable performance improvements by skipping deep directory structures
that are known not to contain any relevant files.
To use an ARCHIVE file first create a .txt
file in the configuration
directory using the syntax described below. The name of the ARCHIVE file has
to be specified in either the SERVER.yaml to apply the filters globally
or the PROJECTS.yaml to apply the
filters to a specific project.
The file can contain the following entries (each at most once):
'FILE' => expr
'DIR' => expr
'TESTCODE_FILE' => expr
'TESTCODE_DIR' => expr
'EXTERNAL_FILE' => expr
'EXTERNAL_DIR' => expr
'GENERATED_FILE' => expr
expr ::= expr || expr | expr && expr | !expr | (expr) | "regexp" regexp ::= a regular expression
Two styles of comments are allowed, so one can add remarks to the expressions:
//
.#
and continuing up to the end of the
line. Note that end of line comments can be used as single
line comments but not vice versa (this would lead to a TICS runtime
error). See the example below (comments are shown in bold for emphasis).The flavor of regular expressions allowed is the Perl Regular Expressions (or Perl Compatible Regular Expressions — PCRE). The most important bits are summarized below.
char | meaning |
---|---|
\ |
escape or special |
^ |
start of string |
$ |
end of string |
. |
any character |
* |
match zero or more times (longest match) |
+ |
match one or more times (longest match) |
? |
match zero or one times |
*? |
match zero or more times (shortest match) |
+? |
match one or more times (shortest match) |
| |
alternative |
[] |
character class |
char | meaning |
---|---|
\ |
escape or special |
^ |
negate the class (only if the first character) |
- |
range of characters |
special | meaning |
---|---|
\b |
word boundary (zero-width assertion) |
\B |
non-word boundary (zero-width assertion) |
\w |
word character |
\W |
non-word character |
\d |
digit |
\D |
non-digit |
\s |
whitespace |
\S |
non-whitespace |
Examples follow below.
All 'FILE'
and 'DIR'
expressions are optional. When
omitted, any path is accepted.
Use 'FILE'
to declare boolean predicates that describe whether a
file path is accepted (predicate returns true) or rejected (predicate returns
false).
The 'FILE'
expression is used to filter files returned
by the transitive file search from a given project. Only those files that
match the expression are kept. All other files are discarded.
Use 'DIR'
to prune the search space. This is useful to speed up
the traversal in large file/directory trees. The predicates expressed here are
evaluated during directory traversal. Only directories matching the predicate
are included in the search. Directories that do not match the predicate are
not traversed any further. Therefore, no files in such directory subtrees are
included.
The 'DIR'
expression can be used to restrict the search space
during collection on large directory structures. This can be useful in case of
very large archives — where file system traversal is very costly —
to speed up the process. The 'DIR'
expression should only be
used to improve the performance.
For example, although one can prevent TICS from analyzing files in
'.svn
' directories by using 'FILE' => !"/\.svn/"
,
this 'FILE'
expression does not prevent TICS from
scanning all files and folders in such directories transitively,
which can be costly depending on the number of files in the directory
structure and the file system's performance. The 'DIR'
expression is used to improve performance.
An important difference between the 'DIR'
and 'FILE'
archive expressions is that the 'DIR'
expression is evaluated for
every (intermediate) directory during directory traversal, whereas
the 'FILE'
expression is evaluated on the complete file path.
Use 'FILE'
and 'DIR'
to set the scope of the
project. Use 'TESTCODE_FILE'
and 'EXTERNAL_FILE'
to
label the Code Type within the scope set by
'FILE'
and 'DIR'
(see below). Avoid using
'TESTCODE_DIR'
and 'EXTERNAL_DIR'
since these do not
work well as labelling mechanisms.
It is possible to distinguish between production code and
test code. Use 'TESTCODE_FILE'
and
'TESTCODE_DIR'
for this. Normally, one may want to exclude test
code and concentrate on production code. Using 'TESTCODE_FILE'
and 'TESTCODE_DIR'
makes it possible to additionally include and
analyze test code. In the viewer, it is supported to distinguish between these
code types or view these together.
'TESTCODE_FILE' => "/java/[^/]+/test/" 'TESTCODE_DIR' =>
Advice: Only use 'TESTCODE_FILE'
to specify
Test Code. The search scope is already limited by the general
'DIR'
predicate. 'TESTCODE_DIR'
is preserved for
compatibility reasons.
It is possible to distinguish between handwritten code and
generated code. Use 'GENERATED_FILE'
for this. Usually,
generated code is less interesting to check than handwritten code, and less
actionable; you may not necessarily have control over the code that is being
generated. Using 'GENERATED_FILE'
makes it possible to additionally
include and analyze generated code. In the viewer, it is supported to
distinguish between these code types or view these together.
'GENERATED_FILE' => "/gen/" || "/generated/
Often, it is not interesting to check the code of external
libraries (sometimes also called third party code) with
TICS. There is no control over violations you get and history is very unlikely
to be interesting. Regardless, there are also reasons to check
external code: for instance because some external headers are
necessary to correctly compile your code, or to find out the quality of the
libraries you are using. In that case, it is possible to set code as
external with the 'EXTERNAL_FILE'
and
'EXTERNAL_DIR'
expressions. This will then be marked in the viewer
as external code.
'EXTERNAL_FILE' => "/3rdparty/" || "/extern/ 'EXTERNAL_DIR' =>
Advice: Only use 'EXTERNAL_FILE'
to specify
External Code. The search scope is already limited by the general
'DIR'
predicate. 'EXTERNAL_DIR'
is preserved for
compatibility reasons.
Each collected file is transformed in so-called canonical form before
it is matched to the archive expression. This is an OS-independent and unique
representation of a file name. One of the characteristics of this format is
that directory separators are denoted by '/
'. Furthermore, on
OS-es with a case insensitive file system (such as the Win32 file systems),
the file name is put in the case that is internally used by the file system.
When creating archive expressions, bear in mind to use '/
' as
directory separator. For matching, the canonical name of a collected file is
used. On case insensitive file systems (e.g., Windows) the match is performed
case insensitively. Otherwise, the match is case sensitive. For Windows, this
allows file extensions to be specified by one clause instead of several. For
example, to match C/C++ header files one can now suffice with a
'"\.h$"
' clause, instead of '"\.h$" || "\.H$"
' to
match any (accidental) header files with extension '.H
'. (Note
that the latter could have been simplified to '"\.[hH]$"
', but
this is still cumbersome. To match the '.java
' extension one
would have to write '"\.[jJ][aA][vV][aA]$"
' to capture all
posibilities.)
The 'FILE'
expression is a logical expression consisting of
disjunction, conjunction, negation and strings of regular expressions that are
applied to collected files. The result is a boolean value, determining whether
the file should be analyzed by TICS or not. Grouping is used to
influence the operator precedence (||
binds weaker than
&&
which, binds weaker than !
).
A typical expression specifies the subset of files that should be accepted and a list of exceptions on this subset that should be disallowed.
'FILE' => // allowed file extensions: note the '||'s and '(', ')' ( "\.h$" || "\.c$" ) && !"_i\.c$" && # ignore certain C files !"_p\.c$" && !"/test/" # ignore files in test directories
This example shows a disjunction two file extensions followed by three
negated clauses that restrict the result set. In this case, some file
suffixes and a directory name are excluded by negated patterns. Note the
'\.
' and '$
'. The \.
is required
to specify a literal dot ('.
'), since in regular
expressions, .
means match any character. For
example, ".c$"
would also match test.cc
which
looks like a C++ file instead of a C file. The $
is
required to specifically match only at the end of a file name.
Otherwise, the expression could match anywhere in a file name. For
example, "\.c"
would also match test.cpp
which
looks like a C++ file instead of a C file. Also note the
'/
' delimiters that are used to delimit the
'test
' directory name. This is the easiest way to specify
that a file name should match the whole directory name. Regular
expression '/test
' matches all directories (and files) that
start with 'test
'. Regular expression 'test/
'
matches all directories (files) that end with 'test
'. Note
that directory separators are denoted by '/
' regardless of
the OS used. The expression above would match the following files:
test.h test.c notest/test.h notest/test.c
and it would reject:
test_i.c test_p.c test/test.h test/test.c
'DIR'
is used to efficiently search through a directory
structure. If a directory is set to be excluded by a 'DIR'
expression, the search path will not recursively go into that directory.
This is especially useful when the file system contains large directories
of files not needed to be stored in the database.
Note that 'FILE'
and 'DIR'
expressions are
evaluated against the root of the branch directory. This makes it possible to
match the start of a file path with '^
'.
For example, assume branch directory /var/lib/jenkins
and file
/var/lib/jenkins/a/b/c.c
. It is possible to filter on
"^/a/b/c\.c$"
since the /var/lib/jenkins
is not
taken into consideration. The following "^a/b/c\.c$"
and
"/a/b/c\.c$"
would also work. With regard to the former, note
that a /
at the start of the branch is implied. This makes it
easier to denote file paths. The latter expression is more liberal since it
would also match /var/lib/jenkins/d/a/b/c.c
; note the extra
intermediate /d/
directory.
The 'DIR'
expression is applied to the search's current
root path. This is not extremely relevant for simple negated
regular expressions, but for compound expressions specifying the
exclusion of a certain directory's subdirectory, care must be taken not
to exclude the directory altogether.
In the following example, path2
is excluded from file search:
'DIR' => !"/path2/"
Directory path2
and its subdirectories are not scanned for
files. Note the '/
' to avoid matching directories that start
with path2
, e.g., path2b
.
In the following example, only path1/path2
is included:
'DIR' => "/path1/path2/"
This is equivalent to the following (longer) expression:
'DIR' => "/path1/$" || "/path1/path2/"
To only accept a certain root path, all prefixes of that root
path must also match. To accept only path1/path2
and its
subdirectories, specify:
'DIR' => "^/path1/path2/"
This is equivalent to the following (longer) expression:
'DIR' => "^/path1/$" || "^/path1/path2/"
The '^
' operator matches a path from the root of the branch. So,
given branch branch
and path branch/path1/path2
, the
expression above matches, but branch/path0/path1/path2
does not.
'DIR' => "^/path0/$" || "/path1/$" || "/path1/path2/"
The expression above, matches branch/path0/path1/path2
, but also
branch/path1/path2
.
This is equivalent to the following (shorter) expression:
'DIR' => "^/path0/$" || "/path1/path2/"
The difference between regular expressions starting with '^
' or
'/
' is probably best explained by some examples.
Assume the following (Windows) path exists on the file system for some
project: C:\branch\path0\path1\path2
. Furthermore, assume the
branch starts in C:\branch
. Below, some archive expressions and
their results are given.
The following expressions match C:\branch\path0
:
'DIR' => "^path0/"'DIR' => "^/path0/"'DIR' => "/path0/"
The following expressions do not match C:\branch\path0
:
'DIR' => !"/path0/"'DIR' => "^/path1/"'DIR' => !"/path0/" && !"/path1/"
The following expressions match both C:\branch\path0
and
C:\branch\path1
:
'DIR' => "/path0/" || "/path1/"'DIR' => "^/path0/$" || "^/path1/"'DIR' => "^path0/$" || "^path1/"
The following expressions match both C:\branch\path0
and
C:\branch\path0\path1
:
'DIR' => "^/path0/$" || "/path1/"'DIR' => "^/path0/path1/"'DIR' => "/path0/$" || "/path1/"'DIR' => "/path0/path1/"
The following expressions match C:\branch\path0
but not
C:\branch\path0\path1
. It would match
C:\branch\path1
if such a path existed:
'DIR' => "^path0/$" || "^path1/$"'DIR' => "^/path0/$" || "^/path1/$"'DIR' => "/path0/$" || "^/path1/$"
The following expressions match C:\branch\path0\path1\path2
:
'DIR' => "/path0/$" || "/path1/$" || "/path2/"'DIR' => "/path0/path1/path2/"'DIR' => "^/path0/path1/path2/"'DIR' => "^/path0/$" || "^/path0/path1/$" || "^/path0/path1/path2/"
The following expressions do not match
C:\branch\path0\path1\path2
(the collector does not get deep
enough into the directory structure):
'DIR' => "/path0/path1/$" # fails at C:\branch\path0\path1\path2 # due to $'DIR' => "^/path1/path2/$" # fails at C:\branch\path0 # path1 does not match at the branch root'DIR' => "/path1/path2/$" # fails at C:\branch\path0 # TICS does not know about path0'DIR' => "/path2/$" # fails at C:\branch\path0 # TICS does not know about path0
Consider the following archive on disk.
. |-- .git |-- inc | |-- h1.h | |-- h2.h | `-- h3.h |-- res | |-- r1.rc | |-- r2.rc | `-- r3.rc |-- src | |-- s1.c | |-- s2.c | `-- s3.c `-- tst |-- t1.c |-- t2.c `-- t3.c
First, we exclude folders .git
and res
.
'DIR' => !"/\.git/" && !"/res/"
Result:
. |-- inc | |-- h1.h | |-- h2.h | `-- h3.h |-- src | |-- s1.c | |-- s2.c | `-- s3.c `-- tst |-- t1.c |-- t2.c `-- t3.c
Add Test Code.
'TESTCODE_FILE' => "/tst/"
This gives two code types: Production and Test code:
Production:
. |-- inc | |-- h1.h | |-- h2.h | `-- h3.h `-- src |-- s1.c |-- s2.c `-- s3.c
Test:
. `-- tst |-- t1.c |-- t2.c `-- t3.c
Suppose that, for some reason, all files with suffix 3
must be
excluded.
'FILE' => !"3\."
Production:
. |-- inc | |-- h1.h | `-- h2.h `-- src |-- s1.c `-- s2.c
Test:
. `-- tst |-- t1.c `-- t2.c
Resulting in the following ARCHIVE:
'FILE' => !"3\." 'DIR' => !"/\.git/" && !"/res/" 'TESTCODE_FILE' => "/tst/"
Exclude all variations of the jquery library.
'FILE' => !"[/._-]jquery.*\.js$"
This matches (and excludes) the following files (among others):
jquery.tmpl.min.js jquery.ui.core.js adblock-jquery.js nwmatcher-jquery.js tree.jquery.js vendor_jquery.js
Exclude certain subdirectories on a certain level.
'DIR' => !"/java/[^/]+/build/" && !"/java/[^/]+/classes/" && !"/java/[^/]+/reports/"
This matches (and excludes) the following folders (among others):
java/CucumberTests/build/ java/Logging/build/ java/SeleniumModels/build/
Exclude all Python code.
'FILE' => !"\.py$"
Exclude certain non-source directories.
'DIR' => !"/stubs/" && !"/data/" && !"^/components/idevs7/resources/" && !"^/deploy/" && !"^/make/"
To successfully use the '^
' operator, one must know where the
root of a branch starts. Each ARCHIVE file is referenced by a project or a
branch in the PROJECTS.yaml
. These branch directories can be
queried for existing projects via the TICSMaintenance
tool as
follows:
TICSMaintenance -project project -info
Look for "dir"
in the output.
"branches" : [ { "baselines" : [], "calculate" : 1, "dir" : "/var/lib/jenkins", # <- HERE "id" : "1", "name" : "main", "visible" : 1 } ],