Metrics Explained

This page is intended to give an overview for the various metrics that can be measured by TICS. Note that this is not an exhaustive list of all intricacies that are part of the definitions of metrics. Instead, this is a broad overview of how the metrics are defined and which details are relevant when interpreting their values.

Code Coverage

Code coverage is a metric of test suite quality. Intuitively, it is the percentage of code that is covered by (automated) unit tests. These tests can automatically invoke parts of the software and evaluate whether their results are correct. Since automatic tests are much more reliable and cheaper to perform than manual tests, this enables software developers to test their code often, which could help to maintain high-quality software. Code coverage is a metric that can be evaluated on a low level (a single file or method), but it is usually computed over an entire project.

Statement Coverage

There are different types of coverage, which vary in terms of granularity and precision. The simplest form of code coverage is statement coverage (also known as line coverage). This is a metric for the percentage of statements that are executed by a unit test.

Consider the following example:

int foo(int a, int b) {
    int c = b;
    
    if (a > 3 && b > 0) {
        c = a;
    }
    
    return c * a;
}

This is an example of a method that can be tested using unit tests. A unit tests consists of an input (such as foo(0,0) or foo(4,2)) and an expected result for each test case (0 and 16 respectively for the given examples). This program has a total of 4 statements (two assignments, an if statement and a return statement). The test case foo(0,0) covers 3 of these statements (75% coverage), while the test case foo(4,2) covers all statements (100% coverage).

Branch Coverage

While statement coverage can give an indication of the quality of a test suite, it may be too easy to achieve. For example, if we only use the second test, we achieve a perfect coverage of 100%. However, it could be that for cases where the condition of the if statement evaluates to false, the result is incorrect. The notion of branch coverage alleviates this issue. It requires every branch of every conditional or loop (including if, case, while, and for statements) to be tested at least once. This means that to achieve 100% branch coverage, we need a test for which a > 3 and b > 0 and one for which a ≤ 3 or b ≤ 0. The previously presented examples are an example of a test suite with 100% branch coverage for this method.

Decision Coverage

As branches get more complex, branch coverage may be insufficient as a metric as well. After all, we only have one test case for each branch. Since a conditional may be composed of an arbitrary number of boolean variables through conjunctions and disjunctions, there are many paths through the code that can end up untested even when 100% branch coverage is achieved. A third type of coverage, decision coverage, requires a test case for each of the composed conditions. That means that in our example, we need at least a case for a > 3 and a ≤ 3 as well as cases for b > 0 and b ≤ 0. Again, the provided test cases satisfy this.

Function Coverage

Finally, some tools measure function coverage. This is a very simple metric that determines the proportion of functions (or methods) that have been invoked by test cases. This type of coverage is too lenient to serve as a serious indicator of quality, but may be used to identify parts of the code that are not tested at all.

Code Coverage Tool Comparison

The type(s) of coverage that are taken into account by TICS depends on the tool that is being used to compute code coverage. The value that is used for the TQI is the average of the available coverage metrics. The only exception to this is function coverage, which is always excluded. The table below provides an overview of the coverage metrics that are supported by a variety of tools.

ToolStatementFunctionBranchDecision
BullseyeCoverage
Squish Coco
CTC
gcov/lcov
PureCoverage
C++Test
VectorCAST
NCover
OpenCover
Jacoco
Cobertura

TQI Code Coverage

The TQI Code Coverage sub-metric [2] is defined as follows:

score = min(0.75 * test_coverage + 32.5, 100)

Abstract Interpretation

Abstract interpretation is a technique in which the computations of a program are evaluated on abstract objects, with the goal of deriving results without executing the program. This is typically implemented by translating source code to a model instead of an executable program. This model is then examined for patterns that reveal potential errors. Abstract interpretation abstracts from the explicit value of a variable, and instead considers all the possible values it can have mapped to some abstract domain.

Consider the following example:

if (obj == null) {
    obj = getObject();
}
obj.method();

One problem abstract interpretation may detect in this example is that the obj variable may have the value null even after line 2. This would cause a null dereference in the final line, which results in an error and potentially the termination of the program. A human programmer could very well overlook this fact; they will likely assume the getObject() method returns an actual object rather than a null value. An abstract interpretation tool would detect that null is one of the potential values of obj and flag a warning.

Abstract interpretation works by considering the range of possible values a variable can take and restricting this range based on statements in the program. This means an abstract interpretation tool may be able to deduce that obj can never take a null value. If one of the values it could take is null, something iswrong with the program.

TQI Abstract Interpretation

Abstract interpretation tools result in violations. The compliance factor [3] is used to compute a score based on these violations.

The TQI Abstract Interpretation sub-metric [2] is defined as follows:

score = max(compliance_factor(abstract_interpretation_violations) * 2 - 100, 0)

Cyclomatic Complexity

Definition

Cyclomatic complexity [1] (also known as McCabe complexity) is a complexity metric for imperative methods. It is computed by representing the possible paths of execution of the program in a graph and computing its cyclomatic number (i.e., the number of regions in a planar embedding of the graph). The idea is that methods with a lot of conditional statement or loops are more complex. This means maintainability of the program is hurt, since it is hard to understand for new developers working on it. It also means testing the code properly is harder, since it is harder to reach a sufficient level of coverage.

Since the original definition of Cyclomatic Complexity is quite abstract, the definition used by TIOBE is loosely

The decision count plus one

The decision count is incremented for control flow statements that introduce a branch in the code, e.g., selection statements (if-statement), iteration statements (while-statement) and exception handling (try-catch-statement).

The Cyclomatic Complexity of a function (or method) is the number of decisions in the function's body plus one. So, the minimum cyclomatic complexity of a function definition is one.

The Cyclomatic Complexity of a file is the number of decisions of all function definitions in that file plus one. In other words, this is the sum of the cyclomatic complexities of all function definitions in the file, minus the number of function definitions, plus one.

The Average Cyclomatic Complexity is the cyclomatic complexity per function of a file. So, this is the sum of the cyclomatic complexities of all function definitions, divided by the number of function definitions in the file.

Aggregation

The Average Cyclomatic Complexity is aggregated over multiple files as follows. Sum the cyclomatic complexities of all files, minus the number of files, divided by the total number of function definitions in these files combined, plus one.

TQI Cyclomatic Complexity

The TQI Cyclomatic Complexity sub-metric [2] is defined as follows:

score = 6400 / (cyclomatic_complexity^3 - cyclomatic_complexity^2 - cyclomatic_complexity + 65)

Cyclomatic Complexity Tool Comparison

There are a (great) number of free and commercial tools available that can be used to measure the (average) cyclomatic complexity of source files. Here, we will take a look at a few available options.

SourceMonitor

SourceMonitor is a freeware program to obtain cyclomatic complexity for many programming languages. It runs on Windows. The results reported by SourceMonitor are quite reasonable, especially when compared to e.g. CCCC (see below). A disadvantage is that it does not support all programming languages for which TIOBE wishes to compute cyclomatic complexity. Another drawback is that it can only be used on Windows, and not on e.g. Linux.

CCCC

CCCC is an open source code counter for C, C++ and Java source files. It is reasonably cross-platform and can run on Windows and Linux. Unfortunately, its results are very unreliable. Moreover, it supports just a few programming languages.

TICSpp

TICSpp is a TIOBE proprietary cyclomatic complexity measurement tool that supports a wide variety of programming languages (including JavaScript, Python, Objective-C and Scala). It runs on Windows, Linux and Solaris SPARC.

There are many more tools available to measure cyclomatic complexity. Some commercial, or part of a commercial package, some specific to a certain programming language. We will not discuss these here.

At TIOBE, we compared the results of CCCC, SourceMonitor and TICSpp on a great number of (customer) source files and found that TICSpp produces better results than SourceMonitor, which in turn produces better results than CCCC. These findings, together with its portability and support for a wide range of programming languages makes TICSpp a logical default choice for our customers.

Cyclomatic Complexity and TICS

Configuration

Be sure to use a default configuration, or specify 'METRICS'=>['TICSpp'], for your language in SERVER.txt.

Running TICS

The metric to run with TICS for Average Cyclomatic Complexity is AVGCYCLOMATICCOMPLEXITY.

To run the TICS client, use TICS -calc AVGCYCLOMATICCOMPLEXITY. To run TICSQServer, use TICSQServer -calc AVGCYCLOMATICCOMPLEXITY.

Investigating High Cyclomatic Complexity in the TICS viewer

To quickly see "bad" files (wrt cyclomatic complexity) in TICS viewer, one can use the treemap and select the Average Cyclomatic Complexity as the color and Total Cyclomatic Complexity for the area metric.

The big, dark red files are the ones to go after first.

Compiler Warnings

Compiler warnings are potential errors in the source code that are discovered during compilation. These include syntactic errors for which the semantics are easily misinterpreted, portability issues and type errors. They can therefore be seen as a metric for the quantity and impact of potential bugs.

Consider the following example:

if (var = 0) {
    var++;
}

Most languages use the = operator for assignment and expect a boolean condition, e.g., var == 0. Note that in this case, var = 0 is not a boolean expression for the equality of var and 0, but an assignment of 0 to var. This assignment is executed, followed by the execution of the code in the body. A compiler will recognize that this is likely unintended and warns the programmer. Other warnings could relate to arithmetic operations that are inconsistent across systems with different bitness. This hurts the portability of the code.

TQI Compiler Warnings

The compliance factor [3] is used to compute a score based on compiler warnings.

The TQI Compiler Warnings sub-metric [2] is defined as follows:

score = max(100 - 50 * log10(101 - compliance_factor(compiler_warnings)), 0)

Coding Standards

Coding standards provide rules for software developers to follow while writing code. The most important reason for coding standards is maintainability; a clear set of rules makes it easier for programmers to understand the meaning of code. A subset of coding standards can be used as predictor for finding fault- related lines of code.

Examples of coding standards include naming conventions (e.g., using lower case for variable names and capitals for class names), rules for indentation (e.g., use of spaces instead of tabs, indenting the body following an if-statement) and the exclusion of language features (e.g., disallowing use of goto statements).

To measure adherence to coding standard automatically, rules can be pro- grammed into a tool that automatically identifies violations of the standard. This requires rules to be sufficiently precise. A meaningful metric for adherence to coding standard depends on the quality of the coding standard and a suitable classification of the weight of each of its rules.

TQI Coding Standards

The compliance factor [3] is used to compute a score based on coding standard violations.

The TQI Coding Standards sub-metric [2] is defined as follows:

score = compliance_factor(coding_standard_violations)

Code Duplication

Code Duplication is a software metric that indicates the amount of source code that occurs more than once in a program. Code Duplication is undesirable because it is associated with higher maintenance costs and can be indicative of bad design.

A duplication is a consecutive set of source code lines (code fragment) that is similar to another code fragment, possibly inside the same file. A code fragment might be part of more than one duplication.

TICS uses the CPD tool to detect code duplication. CPD finds duplicated strings of source tokens, not lines. At least 100 tokens should be logically identical to qualify as code duplication. TICS translates these tokens to lines: a line is considered duplicate code if it contains at least one token that is part of a duplication. Note that TICS runs CPD on a single project and does not detect duplications over different projects.

The TICS Viewer defines the following basic metrics related to Code Duplication:

Code Duplication (LOC)
The number of lines that are part of a duplication.
Code Duplication (%)
The number of lines that are duplicated as a percentage of the total number of lines.
Code Duplication Metric Coverage
The percentage of lines that could be analyzed by the code duplication tool, ideally 100%.
TQI Code Duplication
The TQI score for Code Duplication, a value between 0% and 100%, obtained using the formula:
min(-30 * log10(Code Duplication (%)) + 70, 100) * Code Duplication Metric Coverage

TICS Viewer version 8.5 introduced several advanced Code Duplication metrics that identify different duplication types based on where the source and target fragment of a duplication are located:

File-Internal Code Duplication
The source and target fragments are located in the same file.
File-External Code Duplication
The source and target fragments are located in different files.
Scope-Internal Code Duplication
The source and target files are in the same selected scope.
Scope-External Code Duplication
The target file is outside of the selected scope.

Each of the four types has a relative (%) and an absolute (LOC) variant. The scope-based metrics require a scope that is chosen by the user and can be a directory, component, or other subsystem. The scope is indicated in the breadcrumb trail.

The sum of Internal and External Code Duplication (LOC) might be larger than the value given by the overall Code Duplication (LOC) metric. The reason is that the same fragment can be part of two duplications, of which one internal and one external. Such a fragment is only counted once for Code Duplication (LOC).

Using these advanced metrics, users can filter duplications by the type that they are interested it. For instance, file-internal duplication could be considered easier to solve because they are located in the same file. On the other hand, file-external duplications might be harder to keep track of by developers and one might want to solve them first. The scope-internal metric can be useful for developers to limit the duplications they see to the component for which they are responsible or are allowed to make changes to.

Fan Out

Software programs are structured in terms of modules or components. These modules and components "use" each other. The fan out metric indicates how many different modules are used by a certain module. If modules need a lot of other modules to function correctly (high fan out), there is a high interdependency between modules, which makes code less modifiable. Hence, fan out is related to the "Maintainability" ISO attribute.

Fan out is measured by counting the number of imports per module. The specific measurement is language dependent.

C/C++

For C and C++ the number of include directives is used.

C#

The situation is even more complex for C# because it uses a different import mechanism. The using statement in C# imports a complete namespace, which could consist of hundreds of classes, whereas only a few of these are actually used. That is why for C# the actual number of unique dependencies per file is counted.

Java

For Java, the number of import statements is counted. Wildcards in Java import statements appear to be difficult because these statements import several classes from a package at once. That is why we choose to count these statements as 5.

JavaScript

For EcmaScript 6, the number of import statements is counted.

ES6 modules are stored in files. There is exactly one module per file and one file per module.

Therefore, each module mentioned in the from clause counts as 1 irrespective of the number of elements imported. E.g.,

import localName from 'src/my_lib';
import * as my_lib from 'src/my_lib';
import { name1, name2 } from 'src/my_lib';

all count as 1 since each statement imports one module.

Python

For Python, the number of modules mentioned in the import statement are counted. from-import counts as 1.

Swift

For Swift, the number of modules mentioned in import declarations is counted.

Average Fan Out

Average Fan Out is the number of imports per module. Aggregation is as follows. Sum the fan outs of all files, divide by the number of files.

TQI Fan Out

score = min(max(120 - 5 * fan_out_avg, 0), 100)

Dead Code

Dead code is code that does not serve any purpose. As such, it hurts the maintainability of software.

Measuring dead code is restricted to unused files and methods. Dead code is measured in terms of LOC. This means that an unreachable method or file is counted more heavily when it is larger.

TQI Dead Code

The TQI Dead Code sub-metric [2] is defined as follows:

score = max((100 - 2 * dead_code), 0)

Lines Of Code

Lines Of Code (LOC) counts the physical lines in each source file, including comment lines and blank lines, but excluding generated lines. Generated lines of code are those regions in source files that are automatically generated by development environments and are outside the control of software developers. TICS ships with a default set of recognized markers for generated code for commonly used development environments. See the section on GENERATED code for languages in SERVER.txt.

Lines Of Code is the primary measurement of code size used by TICS. All measurements are related to LOC for comparison. E.g., for the compliance factor violations are weighed by LOC. For treemaps, LOC is the default area metric.

Effective Lines Of Code

Effective Lines Of Code (ELOC) counts the physical lines in each source file, excluding comment lines and blank lines, and excluding generated lines.

Effective Lines Of Code is a measure of actual code size; those lines that are not affected by formatting and style conventions but are necessary for the required functionality of the program being written.

Effective Lines Of Code is a subset of Lines Of Code.

Generated lines included Lines Of Code

Generated lines included Lines Of Code (GLOC) counts the physical lines in each source file, including comment lines and blank lines, and including generated lines.

GLOC indicates the absolute amount of code of the application. This metric is not used as a submetric for any other (aggregated) measurement.

GLOC is a superset of LOC. The difference between LOC and GLOC gives the number of generated lines of code in the code.

Change Metrics

Change metrics (also known as Churn metrics) measure the amount of changes in the project. These changes can give insight into how actively a project is developed, which components are changed most often and which parts are being added and/or removed.

Change Rate

The change rate metric measures the amount of changes that were made since the previous measurement. This change is expressed in terms of lines of code.

The change rate includes code that has been added, deleted or changed. These metrics are stored separately. Their sum is the total change rate.

Accumulative Change Rate

In addition to change rate, TICS also stores accumulative change rate. This metric refers to not just the changes that were made since the last measurement, but all of the changes that have been recorded historically.

Other change metrics

The following metrics are also available:

Date Modified
The last time a file was changed.
Changed Files (#)
The number of added and changed files.
Changed, non-New Files (#)
The number of changed files, excluding added files.
New Files (#)
The number of new files.
Deleted Files (#)
The number of deleted files.

References

  1. T. J. McCabe Sr., "A complexity measure," IEEE Trans. Software Eng., vol. 2, no. 4, pp. 308-320, 1976.
  2. P. Jansen. (2012). The TIOBE Quality Indicator: A pragmatic way of measuring code quality, [Online]. Available: http://www.tiobe.com/files/TIOBEQualityIndicator_v3_11.pdf.
  3. P. Jansen, R. L. Krikhaar, and F. Dijkstra. (2007). Towards a single software quality metric, [Online]. Available: http://www.tiobe.com/files/DefinitionOfConfidenceFactor.pdf.