Complexity Metrics and Models

Halstead's Software Science

The Software Science [Halstead 77] developed by M.H.Halstead principally attempts to estimate the programming effort.

The measurable and countable properties are :

From these metrics Halstead defines :
  1. the vocabulary n as n = n1 + n2
  2. the implementation length N as N = N1 + N2
Operators can be "+" and "*" but also an index "[...]" or a statement separation "..;..". The number of operands consists of the numbers of literal expressions, constants and variables.

Length Equation
It may be necessary to know about the relationship between length N and vocabulary n. Length Equation is as follows. " ' " on N means it is calculated rather than counted :

N ' = n1log2n1 + n2log2n2

It is experimentally observed that N ' gives a rather close agreement to program length.

Quantification of Intelligence Content
The same algorithm needs more consideration in a low level programming language. It is easier to program in Pascal rather than in assembly. The intellegence Content determines how much is said in a program.

In order to find Quantification of Intelligence Content we need some other metrics and formulas :

Program Volume : This metric is for the size of any implementation of any algorithm.

V = Nlog2n

Program Level : It is the relationship between Program Volume and Potential Volume. Only the most clear algorithm can have a level of unity.

L = V* / V

Program Level Equation : is an approximation of the equation of the Program Level. It is used when the value of Potential Volume is not known because it is possible to measure it from an implementation directly.

L ' = n*1n2 / n1N2

Intelligence Content

I = L ' x V = ( 2n2 / n1N2 ) x (N1 + N2)log2(n1 + n2)

In this equation all terms on the right-hand side are directly measurable from any expression of an algorithm. The intelligence content is correlated highly with the potential volume. Consequently, because potential volume is independent of the language, the intelligence content should also be independent.

Programming Effort

The programming effort is restricted to the mental activity required to convert an existing algorithm to an actual implementation in a programming language.
In order to find Programming effort we need some metrics and formulas :

Potential Volume : is a metric for denoting the corresponding parameters in an algorithm's shortest possible form. Neither operators nor operands can require repetition.

V ' = ( n*1 + n*2 ) log2 ( n*1 + n*2 )

Effort Equation
The total number of elementary mental discriminations is :

E = V / L = V2 / V '

If we express it : The implementation of any algorithm consists of N selections ( nonrandom > of a vocabulary n. a program is generated by making as many mental comparisons as the program volume equation determines, because the program volume V is a measure of it. Another aspect that influences the effort equation is the program difficulty. Each mental comparison consists of a number of elementary mental discriminations. This number is a measure for the program difficulty.

Time Equation
A concept concerning the processing rate of the human brain, developed by the psychologist John Stroud, can be used. Stroud defined a moment as the time required by the human brain to perform the most elementary discrimination. The Stroud number S is then Stroud's moments per second with 5 <= S <= 20. Thus we can derive the time equation where, except for the Stroud number S, all of the parameters on the right are directly measurable :

T ' = ( n1N2( n1log2n1 + n2log2n2) log2n) / 2n2S

Advantages of Halstead :
  1. Do not require in-depth analysis of programming structure.
  2. Predicts rate of error.
  3. Predicts maintenance effort.
  4. Useful in scheduling and reporting projects.
  5. Measure overall quality of programs.
  6. Simple to calculate.
  7. Can be used for any programming language.
  8. Numerous industry studies support the use of Halstead in predicting programming effort and mean number of programming bugs.
Drawbacks of Halstead :
  1. It depends on completed code.
  2. It has little or no use as a predictive estimating model. But McCabe's model is more suited to application at the design level.

McCabe's Cyclomatic number

A measure of the complexity of a program was developed by [McCabe 1976]. He developed a system which he called the cyclomatic complexity of a program. This system measures the number of independent paths in a program, thereby placing a numerical value on the complexity. In practice it is a count of the number of test conditions in a program.

The cyclomatic complexity (CC) of a graph (G) may be computed according to the following formula:

CC(G) = Number (edges) - Number (nodes) + 1

The results of multiple experiments (G.A. Miller) suggest that modules approach zero defects when McCabe's Cyclomatic Complexity is within 7 ± 2.

A study of PASCAL and FORTRAN programs (Lind and Vairavan 1989) found that a Cyclomatic Complexity between 10 and 15 minimized the number of module changes.

Thomas McCabe who is the invitor of cyclomatic complexity has founded a metrics company, McCabe & Associates There are some other metrics that are inspritted from cyclomatic complexity. You can find them at McCabe metrics

Advantages of McCabe Cyclomatic Complexity :
  1. It can be used as a ease of maintenance metric.
  2. Used as a quality metric, gives relative complexity of various designs.
  3. It can be computed early in life cycle than of Halstead's metrics.
  4. Measures the minimum effort and best areas of concentration for testing.
  5. It guides the testing process by limiting the program logic during development.
  6. Is easy to apply.
Drawbacks of McCabe Cyclomatic Complexity :
  1. The cyclomatic complexity is a measure of the program's control complexity and not the data complexity
  2. the same weight is placed on nested and non-nested loops. However, deeply nested conditional structures are harder to understand than non-nested structures.
  3. It may give a misleading figure with regard to a lot of simple comparisons and decision structures. Whereas the fan-in fan-out method would probably be more applicable as it can track the data flow
Fan-In Fan-Out Complexity - Henry's and Kafura's

Henry and Kafura (1981) [from Sommerville 1992] identified a form of the fan in - fan out complexity which maintains a count of the number of data flows from a component plus the number of global data structures that the program updates. The data flow count includes updated procedure parameters and procedures called from within a module.

Complexity = Length x (Fan-in x Fan-out)2

Length is any measure of length such as lines of code or alternatively McCabe's cyclomatic complexity is sometimes substituted.

Henry and Kafura validated their me tric using the UNIX system and suggested that the measured complexity of a component allowed potenetially faulty system components to be identified. They found that high values of this metric were often measured in components where there had historically been a high number of problems.

Advantages of Henry's and Kafura's Metic
  1. it takes into account data-driven programs
  2. it can be derived prior to coding, during the design stage
Drawbacks of Henry's and Kafura's Metic
  1. it can give complexity values of zero if a procedure has no external interactions

Sencer's Home Software Measurement Page
sencer@hun.edu.tr
This page is maintained by Ümit Karakaş and Sencer Sultanoğlu
Last modified Oct 12, 98