Metrics Explained

Cyclomatic Complexity

Definition

Cyclomatic Complexity is the software metric used to indicate the complexity of a program as originally described by McCabe (see the Wikipedia entry for a reference).

Since the original definition of Cyclomatic Complexity is quite abstract (in terms of graphs), the definition used by TIOBE is loosely

The decision count plus one

The decision count is incremented for control flow statements that introduce a branch in the code, e.g., selection statements (if-statement), iteration statements (while-statement) and exception handling (try-catch-statement).

The Cyclomatic Complexity of a function (or method) is the number of decisions in the function's body plus one. So, the minimum cyclomatic complexity of a function definition is one.

The Cyclomatic Complexity of a file is the number of decisions of all function definitions in that file plus one. In other words, this is the sum of the cyclomatic complexities of all function definitions in the file, minus the number of function definitions, plus one.

The Average Cyclomatic Complexity is the cyclomatic complexity per function of a file. So, this is the sum of the cyclomatic complexities of all function definitions, divided by the number of function definitions in the file.

Aggregation

The Average Cyclomatic Complexity is aggregated over multiple files as follows. Sum the cyclomatic complexities of all files, minus the number of files, divided by the total number of function definitions in these files combined, plus one.

TQI Cyclomatic Complexity

The TQI Cyclomatic Complexity submetric is described in The TIOBE Quality Indicator document, Section 5.

score = 6400 / (cyclomatic_complexity^3 - cyclomatic_complexity^2 - cyclomatic_complexity + 65)

Cyclomatic Complexity Tool Comparison

There are a (great) number of free and commercial tools available that can be used to measure the (average) cyclomatic complexity of source files. Here, we will take a look at a few available options.

SourceMonitor

SourceMonitor is a freeware program to obtain cyclomatic complexity for many programming languages. It runs on Windows. The results reported by SourceMonitor are quite reasonable, especially when compared to e.g. CCCC (see below). A disadvantage is that it does not support all programming languages for which TIOBE wishes to compute cyclomatic complexity. Another drawback is that it can only be used on Windows, and not on e.g. Linux.

CCCC

CCCC is an open source code counter for C, C++ and Java source files. It is reasonably cross-platform and can run on Windows and Linux. Unfortunately, its results are very unreliable. Moreover, it supports just a few programming languages.

TICSpp

TICSpp is a TIOBE proprietary cyclomatic complexity measurement tool that supports a wide variety of programming languages (including JavaScript, Python, Objective-C and Scala). It runs on Windows, Linux and Solaris SPARC.

There are many more tools available to measure cyclomatic complexity. Some commercial, or part of a commercial package, some specific to a certain programming language. We will not discuss these here.

At TIOBE, we compared the results of CCCC, SourceMonitor and TICSpp on a great number of (customer) source files and found that TICSpp produces better results than SourceMonitor, which in turn produces better results than CCCC. These findings, together with its portability and support for a wide range of programming languages makes TICSpp a logical default choice for our customers.

Cyclomatic Complexity and TICS

Configuration

Be sure to use a default configuration, or specify 'METRICS'=>['TICSpp'], for your language in SERVER.txt.

Running TICS

The metric to run with TICS for Average Cyclomatic Complexity is AVGCYCLOMATICCOMPLEXITY.

To run the TICS client, use TICS -calc AVGCYCLOMATICCOMPLEXITY. To run TICSQServer, use TICSQServer -calc AVGCYCLOMATICCOMPLEXITY.

Investigating High Cyclomatic Complexity in the TICS viewer

To quickly see "bad" files (wrt cyclomatic complexity) in TICS viewer, one can use the treemap and select the Average Cyclomatic Complexity as the color and Total Cyclomatic Complexity for the area metric.

The big, dark red files are the ones to go after first.

Fan Out

Software programs are structured in terms of modules or components. These modules and components "use" each other. The fan out metric indicates how many different modules are used by a certain module. If modules need a lot of other modules to function correctly (high fan out), there is a high interdependency between modules, which makes code less modifiable. Hence, fan out is related to the "Maintainability" ISO attribute.

Fan out is measured by counting the number of imports per module. The specific measurement is language dependent.

C/C++

For C and C++ the number of include directives is used.

C#

The situation is even more complex for C# because it uses a different import mechanism. The using statement in C# imports a complete namespace, which could consist of hundreds of classes, whereas only a few of these are actually used. That is why for C# the actual number of unique dependencies per file is counted.

Java

For Java, the number of import statements is counted. Wildcards in Java import statements appear to be difficult because these statements import several classes from a package at once. That is why we choose to count these statements as 5.

JavaScript

For EcmaScript 6, the number of import statements is counted.

ES6 modules are stored in files. There is exactly one module per file and one file per module.

Therefore, each module mentioned in the from clause counts as 1 irrespective of the number of elements imported. E.g.,

import localName from 'src/my_lib';
import * as my_lib from 'src/my_lib';
import { name1, name2 } from 'src/my_lib';

all count as 1 since each statement imports one module.

Python

For Python, the number of modules mentioned in the import statement are counted. from-import counts as 1.

Swift

For Swift, the number of modules mentioned in import declarations is counted.

Average Fan Out

Average Fan Out is the number of imports per module. Aggregation is as follows. Sum the fan outs of all files, divide by the number of files.

TQI Fan Out

score = min(max(120 - 5 * fan_out_avg, 0), 100)

Code Duplication

Code Duplication is a software metric that indicates the amount of source code that occurs more than once in a program. Code Duplication is undesirable because it is associated with higher maintenance costs and can be indicative of bad design.

A duplication is a consecutive set of source code lines (code fragment) that is similar to another code fragment, possibly inside the same file. A code fragment might be part of more than one duplication.

TICS uses the CPD tool to detect duplicated code. CPD finds duplicated strings of source tokens, not lines. At least 100 tokens should be logically identical to qualify as duplicated code. TICS translates these tokens to lines: a line is considered duplicate code if it contains at least one token that is part of a duplication. Note that TICS runs CPD on a single project and does not detect duplications over different projects.

The TICS Viewer defines the following basic metrics related to Code Duplication:

Code Duplication (LOC)
The number of lines that are part of a duplication.
Code Duplication (%)
The number of lines that are duplicated as a percentage of the total number of lines.
Code Duplication Metric Coverage
The percentage of lines that could be analyzed by the code duplication tool, ideally 100%.
TQI Code Duplication
The TQI score for Code Duplication, a value between 0% and 100%, obtained using the formula:
min(-30 * log10(Code Duplication (%)) + 70, 100) * Code Duplication Metric Coverage

TICS Viewer version 8.5 introduced several advanced Code Duplication metrics that identify different duplication types based on where the source and target fragment of a duplication are located:

File-Internal Code Duplication
The source and target fragments are located in the same file.
File-External Code Duplication
The source and target fragments are located in different files.
Scope-Internal Code Duplication
The source and target files are in the same selected scope.
Scope-External Code Duplication
The target file is outside of the selected scope.

Each of the four types has a relative (%) and an absolute (LOC) variant. The scope-based metrics require a scope that is chosen by the user and can be a directory, component, or other subsystem. The scope is indicated in the breadcrumb trail.

The sum of Internal and External Code Duplication (LOC) might be larger than the value given by the overall Code Duplication (LOC) metric. The reason is that the same fragment can be part of two duplications, of which one internal and one external. Such a fragment is only counted once for Code Duplication (LOC).

Using these advanced metrics, users can filter duplications by the type that they are interested it. For instance, file-internal duplication could be considered easier to solve because they are located in the same file. On the other hand, file-external duplications might be harder to keep track of by developers and one might want to solve them first. The scope-internal metric can be useful for developers to limit the duplications they see to the component for which they are responsible or are allowed to make changes to.

Lines Of Code

Lines Of Code (LOC) counts the physical lines in each source file, including comment lines and blank lines, but excluding generated lines. Generated lines of code are those regions in source files that are automatically generated by development environments and are outside the control of software developers. TICS ships with a default set of recognized markers for generated code for commonly used development environments. See the section on GENERATED code for languages in SERVER.txt.

Lines Of Code is the primary measurement of code size used by TICS. All measurements are related to LOC for comparison. E.g., for the compliance factor violations are weighed by LOC. For treemaps, LOC is the default area metric.

Effective Lines Of Code

Effective Lines Of Code (ELOC) counts the physical lines in each source file, excluding comment lines and blank lines, and excluding generated lines.

Effective Lines Of Code is a measure of actual code size; those lines that are not affected by formatting and style conventions but are necessary for the required functionality of the program being written.

Effective Lines Of Code is a subset of Lines Of Code.

Generated lines included Lines Of Code

Generated lines included Lines Of Code (GLOC) counts the physical lines in each source file, including comment lines and blank lines, and including generated lines.

GLOC indicates the absolute amount of code of the application. This metric is not used as a submetric for any other (aggregated) measurement.

GLOC is a superset of LOC. The difference between LOC and GLOC gives the number of generated lines of code in the code.