Talk:Cyclomatic complexity

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Alternative Way is not helpful[edit]

The "Alternative Way" section is not helpful, as it uses the undefined term "number of closed loops". The author might be getting at the reason behind the term "clclomatic", which is that if the (directed) control flow graph is viewed as an undirected graph, the cyclomatic number is the minimum number of edges that must be removed to remove all loops from the undirected graph. However, this calculation is hardly any easier than counting the number of branches (conditionals). I recommend removing the "Alternative Way" section or replacing it by describing the connection to the cyclomatic number in undirected graphs. Greg sullivan 19:52, 23 August 2006 (UTC)[reply]

Alternative way seems to have been removed but the second example still refers to it. It uses a formula that doesn't have a 2 in it. X10 (talk) 15:13, 21 June 2018 (UTC)[reply]

Material merge[edit]

Ok, I merged in the material that was in the erroneously named Cyclometric complexity. I didn't do it all too perfectly, partly because I am not all that familiar with the subject. Some of it seems to conflict a bit, possibly the definitions, so if someone knowledgeable could properly fact check it using the external links or other good resources that would be great. But I thought I should at least do what I can. - Taxman 22:50, Feb 24, 2005 (UTC)

Clarifications[edit]

http://c2.com/cgi/wiki?CyclomaticComplexityMetric has the definition:

E = number of edges in the graph.
N = number of nodes in the graph.
P = number of nodes that are exit points
     (last instruction, return, exit, etc.)
 
 Then
 
Cyclomatic complexity = E - N + P

That's a simpler explanation of 'P' than 'connected nodes'. It really means 'ways in which this module can branch to out to other modules'.

The article says, "It directly measures the number of linearly independent paths through a program's source code.". Wouldn't it be better to say that it gives an upper bound on the number linearly independent paths? For example, in the first control flow graph given in the margin, there are as few as two or as many as three linearly independent paths.

75.87.186.225 (talk) 17:15, 23 October 2009 (UTC)[reply]

Rule of thumb[edit]

That rule of thumb is a bit unbelievable. I have a simple frame of C code with 4 functions: main(), parse_options(), print_version() and print_usage(). There are only two available options: "help" and "version". The file is 89 lines long, but only 67 of them are real lines of code. Cyclomatic number for such a simple file is equal to 8, according to measure done by cccc. So is the file close to be "error-prone"? Is article wrong, rule of thumb is wrong, cccc is wrong or maybe I don't understand something? Could someone explain this, please? LukMak 18:20, 28 March 2006 (UTC)

Your code is only error prone if you have not handled all conditionals. We use cyclomatic complexity for two purposes.

During the initial development cycle, when designing test frames using decision tables all pathways must be addressed. Cyclomatic complexity use at this point provides the point that test frames are needed, and also highlights consitions that the programmer or arhcitect has missed.

The second purpose is to identify functions (and programs) that use conditionals nested very deeply. This is not good because most humans have a problem reading deeply nested conditionals and may add code during the maintenance cycle that does not address all pathways.

Hope this helps.

--Bill Buckels 12:26, 6 November 2007 (UTC)[reply]

Discuss: Merge article with "essential complexity"[edit]

Against
My two cents: merging this article ("cyclomatic complexity") with "essential complexity" doesn't make sense, because cyclomatic complexity is a method of measuring complexity (of a body of code / instructions), and the complexity which this method measures isn't implicitly categorized as either "essential" or "accidental". That's one of the strengths of cyclomatic complexity measurement, frankly: it's just one objective measurement of complexity, and it doesn't try to assert whether anything can or should be done about the complexity that it measures. Cyclomatic complexity and essential complexity are effectively in different categories that I would name something like "measures of complexity" and "interpretation / meaning of complexity", respectively. Mlibby 16:20, 17 March 2007 (UTC)[reply]
Against
Cyclomatic complexity is one measure of the total complexity of a computer program; essential complexity is a kind of complexity found in a problem. Sometimes people write programs to solve problems, and in that case the essential complexity of the problem often finds its way into the program, but there's always other complexity involved as well. In short, cyclomatic complexity is very different from essential complexity, and the two are only loosely related. Kragen Sitaker 02:34, 1 April 2007 (UTC) Update: McCabe's paper defining cyclomatic complexity also defines a concept called "essential complexity", but it is also unrelated to the concept in the Essential Complexity Wikipedia page. Perhaps this explains how this otherwise bizarre proposal for merging arose. Kragen Sitaker 03:46, 1 April 2007 (UTC)[reply]

I agree that this proposal is bizarre. The Essential Complexity article is clearly talking about the notion from Fred Brooks' "No Silver Bullet" paper; that and cyclomatic complexity are very distinct concepts. Glv 19:30, 10 April 2007 (UTC)[reply]

Also signed for against; the two articles are a different focus. --BlueNovember 16:17, 20 May 2007 (UTC)[reply]

I am STRONGLY against merging this article with the Essential Complexity Wikipedia page and I am actually shocked that this was suggested. This comment should be removed from this page altogether because it makes no sense to anyone wanting a conscise definition of Cyclomatic Complexity as I have understood in my last 12 years as a Software Developer.

--Bill Buckels 12:17, 6 November 2007 (UTC)[reply]

What about functions?[edit]

This article neglects to explain how functions are handled. If a function is called from ten places, then control flow can flow from each of those ten places to the beginning of that function, and from the function's end back to the point after each of those ten places. Are these 20 arcs part of the control-flow graph being analyzed, or not?

Consider, for example, this Python program to calculate a sum of some squares:

   total = 0
   for ii in [2, 4, 8, 10]:
       total += ii * ii
   print total

This program clearly has cyclomatic complexity of M = E - N + P = 4 - 4 + 1 = 2, which is unsurprising since it contains one loop and no conditionals.

Now, we can rewrite this with a function as follows:

   total = 0
   def n(ii):
       global total
       total += ii * ii
   n(2); n(4); n(8); n(10)
   print total

If we treat the "def n(ii):" line as a compile-time declaration, as it is in most languages, rather than a run-time statement, which it is in Python, then we can get a couple of different measures. There are 8 statements (7 if you exclude "global") and either 6 edges and 2 connected components, if we exclude the edges from the calls to n to the top of the body of n and from the bottom of the body back to the next statement after the call, or 14 edges and 1 connected component, if we include those edges. This gives 6 - 8 + 2 = 0 or 14 - 8 + 1 = 7, both of which seem like implausible answers.

I'm reading McCabe's paper now to see what he says about this (if anything!), and then perhaps I can clarify the article.

Update: McCabe's paper uses the formula M = E - N + 2P throughout most of it, although it states the M = E - N + P version up front; 6 - 8 + 4 = 2 and 14 - 8 + 2 = 8, and change the original version to 4 - 4 + 2 = 2, and at least the first of those numbers seems like a plausible value, not changing the essential complexity of the program.

So far I haven't run across anything in the paper about functions, but maybe Section IV explains it. What about methods or higher-order functions, I wonder? The first formula could treat them consistently; the second would be even more completely hosed.

Update: Section III suggests limiting each module (in FORTRAN, a subroutine) to a cyclomatic complexity of 10, and, as I had suspected, section IV explains that McCabe treats separate subroutines as independent graphs. This is sensible. There's an error on p.310 where a straight-line program is claimed to have V(G) = 2 instead of 1 (would be 0 by E - N + P), but so far the other values I've seen are consistent. I also cleaned up my code above a bit.

Kragen Sitaker 03:09, 1 April 2007 (UTC)[reply]

Update: Although the basic idea seems good, and the paper is still better than the current version of this Wikipedia entry, this is really rather a poor paper, aside from the errors I pointed out above, some of which seem to have caused some long-lasting confusion (e.g. the error of omitting the 2 in the formula the first time it's presented).

The author debases Dijkstra's term "structured", which Dijkstra coined to describe programs that were comprehensible and had mathematically provable behavior, to mean "using only alternation, iteration, and sequencing as control-flow operations," and indeed at one point (p.315, section VI, second paragraph) characterizes Knuth's article "Structured Programming With GO TO Statements" as advocating the use of "unstructured goto", which is exactly the opposite of Knuth's point in that article, which is that a slightly wider range of control structures are needed to structure programs comprehensibly --- that is, that structured programming is occasionally mutually exclusive with the debased meaning of "structured programming" being used by McCabe in this paper. (I've read Knuth's article, but you can begin to see the problem just from its title. It's currently available at http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf)

It's possible that the error on p.310 is a result of a problem in the digitization --- perhaps an arrow is missing.

There are several other errors:

  • the diagram of "if" in footnote 4 on p.315 has an arrow going the wrong way;
  • the diagrams of the "following four control structures" on p.315 is missing one of the four;
  • the descriptions of those control structures on p.316 exchanges (c) and (d), or at any rate it disagrees with the diagram above about which ones they are;
  • the references misspell Henry Ledgard's name as "Legard" and "genealogy" (in the title of Ledgard and Marcotty's paper) as "generalogy";
  • they also misspell the name of Knuth's paper slightly.

This is not an exhaustive list.

Kragen Sitaker 04:09, 1 April 2007 (UTC)[reply]

A couple of points re McCabe's paper: first, in 1976 the term "structured programming" was a very specific term that doesn't have the broad meaning we tend to read it with today. The meaning McCabe used was a common interpretation of the phrase. Knuth's article was, effectively, advocating a broadening of this definition. It was therefore reasonable for McCabe to describe what he was advocating as "unstructured": his ideas had not become mainstream at that point. Programs with breaks from loops were not considered structured at the time. Neither were programs that contained functions with multiple returns. Some of the uses of go to that Knuth suggested were acceptable are still today not considered to be part of structured programming (e.g. using goto to jump to error handlers).
The "+P" rather than "+2P" appears to be due to the suggestion of adding an additional edge from the exit to the entrance of the function. This additional edge, of which there will be one for each connected subset of the graph, eliminates the necessity to use 2P. The graph shown on the first page includes such an edge; most of the rest of the graphs do not, hence the use of 2P in the rest of the article. This could have been explained better, but is not an error.
You appear to have a rather poor scan of the article. The copy I've linked to shows the program you refer to on p310 has a loop from block 2 back to itself, and does have all four control structures on p.315. JulesH (talk) 20:48, 24 March 2009 (UTC)[reply]

Functional vs Imperative Programming[edit]

Is it possible to apply cyclomatic complexity to code written in a functional programming language, or otherwise? If not, then the article should specify the paradigms to which cyclomatic complexity applies. If something similar is possible with functional programming language, it would be interesting if the article discussed it. Mgsloan 04:37, 2 October 2007 (UTC)[reply]

Exceptions anyone?[edit]

In the presence of exceptions, featured by many modern languages, those need to be treated as possible exit points, too. Looking at the example code provided, it means that you can not test all code with two test cases. Rather, you need

  • c1==true, f1() throws
  • c1==true, f1() returns
  • c1==false, f2() throws
  • c1==false, f2() returns
  • c2==true, f3() throws
  • c2==true, f3() returns
  • c2==false, f4() throws
  • c2==false, f4() returns

Some people might argue that f3 or f4 throwing or returning doesn't change anything, so they shouldn't be treated as different cases. However, I wouldn't do that, because they exit the containing function through different exit points. Question remains what the official definition has to say on that topic.

Attempting to improve clarity of the article[edit]

But have a few concerns that I can't really answer myself. First: the "formal definition" section goes beyond my understanding of this topic by straying into a field of mathematics that I know little about. But, I can find no sources that discuss the topic in these terms. A google search for '"cyclomatic complexity" "relative homology"' turns up nothing except wikipedia mirrors. Is this original research?

There appears to be more than one formal definition of cyclomatic complexity: for example see Henderson-Sellers doi:10.1007/BF00403560. Henderson-Sellers claims that one metric is better suited to module-level testing while the other is more appropriate for system-level, integration testing. The article should at least mention other definitions. Here are some more relevant articles:
  • Baker, A.L. and Zweben, S.H. (1980) A comparison of measures of control flow complexity. IEEE Transactions of Software Engineering, 6(6), 506–512.
  • Feghali, I. and Watson, A.H. (1994) Clarification concerning modularization and McCabe's cyclomatic complexity. Communications of the ACM, 37(4), 91–92.
  • Henderson-Sellers, B. (1992) Modularization and McCabe's cyclomatic complexity. Communications of the ACM, 35(12), 17–19.
  • Shepperd, M. (1988) A critique of cyclomatic complexity as a software metric. Software Engineering Journal, 3, 30–36.
  • Weyuker, E.J. (1988) Evaluating software complexity measures. IEEE Transactions of Software Engineering, 14(9), 1357–1365.
Hope that helps. pgr94 (talk) 22:34, 24 March 2009 (UTC)[reply]
Well, the original McCabe proposal for the CC, resided on the definition of Cyclomatic Number in the graph theory. Berge defined it in Graphs and Hypergraphs. But the same number could also be defined in a branch of that theory called topological graph theory and, if so, that definition is exactly the same as reported in the wiki page. Of course the first is simpler and more pratically useful, but this is the subtle relation between them. --Ciosbel (talk) 20:30, 8 January 2010 (UTC)[reply]

As for the "prediction ability" of CC, are we talking about predicting number of defects in a module based on its complexity, or something else? Is there a paper associated with this keynote speech? I've seen a set of slides for it, but can find nothing about CC in those. JulesH (talk) 19:23, 24 March 2009 (UTC)[reply]

I think that the meaning "prediction ability" is intended as predicting the future program failures (or defects) from its complexity. I've found the slides you mention and only slide 17 matters. I've also found an article by Les Hatton about comparing defects against a wide varierty of well-known static measures (and of course also CC) looking for correlations. The analysis was done on the NAG Fortran library already cited in the slides. As it turned out, he found there were a correletion between CC and the number of executables lines of code; Quoting from the article: "[...] it is extremely likely that there will be a decision about every 3-4 executables lines of code [...]" and he stated even an equation for that. So i think the original author ment that for "CC has the same prediction ability as lines of code".Here there's the article. Ciosbel (talk) 19:21, 8 January 2010 (UTC)[reply]

I removed "However there are certain types of modules that one would expect to have a high complexity number, such as user interface (UI) modules containing source code for data validation and error recovery.[citation needed]". Obviously the author is not familiar with the humble view design pattern: http://martinfowler.com/eaaDev/OrganizingPresentations.html

Example where P>1[edit]

The article about Connected component does not deal with directed graphs very well. If it is applicable to interpret the flow graph as undirected, there can almost never be a P > 1 (a function use to have only one entry point, and all nodes are reachable from that). Can anyone give an example where P > 1? -- 130.236.56.173 (talk) 11:00, 12 January 2010 (UTC)[reply]

Yes, one should consider the flow graph as undirected for purposes of computing the cyclomatic number. A case with could be if one considers a code library rather than a program, as each exported function then has its own entry point. Multithreaded programs is perhaps another (although, now that I think about it, that is perhaps more properly dealt with by forming a product of the flow graphs for the different threads). 130.239.119.132 (talk) 13:02, 11 February 2011 (UTC)[reply]

The definition of P is unclear: with a simple case bool isOdd(int n) { if (a % 2 == 0) return true; else return false; } , according to Formutation1: N=3 E=2 , so M = E1 - N1 + 2P1 = 2P1 - 1 and Formulation2: N=3 E=4, so M = E2 - N2 + P2 = 1 + P2. if P1 = P2 then P=2 (the number of return) (-> M=3), but it is not the number of connected component (which is necessary 1 in formulation 2... that means that P2=1 -> M=2 -> P1=3/2 ?! ) Jarod42 (talk) 12:12, 26 October 2013 (UTC)[reply]

Example at the end of "Implications for Software Testing" misleading[edit]

The example of a function that requires more than simply branch coverage to test accurately at the end of "Implications for Software Testing" is a bit misleading.

It first shows that two tests can mis a bug. Then it says that by increasing the number of tests to the cyclomatic complexity, the bug is always caught. It is true in this case that the cyclomatic complexity and the number of tests required to always find that bug are equal. However, the example suggests that there is a link between the cyclomatic complexity and the number of test required to always catch that bug.

As the purpose of the example is to show that branch coverage is not always enough, I suggest to scrap everything after "Neither of these cases exposes the bug.".

Homotopy[edit]

The fundamental group of a 1-dimensional CW complex should be the free group on n generators, not Z^n, unless I'm very much mistaken. Additionally, are we including the loop from end to start when computing the fundamental group? In the example under Implications For Software Testing, it gives the complexity as 3, but using the graph next to it (with end->start connection) would give the free group on 3 generators as homotopy- and it says earlier that the value is then one more than that. I think it is more intuitive to use the looped graph and say that if pi-1(X) is the free group on n generators, then n is the cyclomatic complexity; either way, what we're taking the fundamental group of should be made more clear, I believe. Cyrapas (talk) 22:10, 7 May 2013 (UTC)[reply]

Etymology/Naming section contains confusing statements[edit]

This statement:

A better name for popular usage would be conditional complexity, as "it has been found to be more convenient to count conditions instead of predicates when calculating complexity".[6]

is confusing. Cyclomatic complexity can be easily calculated by counting number of branches, not conditions or predicates. The formula is then:

CC = 1 + sum_{nodes n: n has outdegree > 1} (outdegree(n)-1)

In fact, if you don't use switch-case-type instrucions (you can always change it to a sequence of IF statements), the cyclomatic complexity is: one plus number of places where the decision (that is: PREDICATE!!!) is made. Decision is a predicate, not condition! Condition is a part of a predicate and has nothing to do with cyclomatic complexity. In fact, it is more convenient to use predicates, not conditions when calculating CC (you can argue the quoted McCabe's statement!)

Someone should change this fragment, as it's very confusing. — Preceding unsigned comment added by 89.72.75.243 (talk) 22:05, 25 November 2013 (UTC)[reply]

Symmetric Difference wikipedia entry linked is for sets, not graphs[edit]

Re this line:

The set of all even subgraphs of a graph is closed under symmetric difference

The symmetric difference of subgraphs of a graph is not discussed in the wikipedia page for symmetric difference that is linked. — Preceding unsigned comment added by Scott Allen Brown (talkcontribs) 00:09, 13 December 2013 (UTC)[reply]

The immediately preceding paragraph identifies an Eulerian subgraph with the set of its edges, so symmetric difference would seem to apply here. What I do not understand is the phrase: "which is equivalent to only considering those even subgraphs which contain all vertices of the full graph." I cannot understand why this phrase (which seems to have been there since the beginning) is there, nothing seems to reply upon it (in fact it seems to be an error, since it would spoil the closure of the Eulerian subgraphs under symmetric difference if it were true). Zteve (talk) 15:19, 28 August 2015 (UTC)[reply]

The symmetric difference is an operation on the set of edges, in this context (according to the preceding paragraph). When taking such a symmetric difference, the vertex set doesn't change. It always includes all vertices of the original graph. Equivalently, we can completely ignore the vertices. None of the mathematical definitions actually need a vertex set. 205.189.0.108 (talk) 07:48, 3 September 2019 (UTC)[reply]

Wrong CC number in example in "Implications for software testing"[edit]

The example shown has cyclomatic complexity 3, not 4 (9 - 7 + 1 for strongly connected graph) — Preceding unsigned comment added by 169.145.89.205 (talk) 00:17, 27 June 2014 (UTC)[reply]

"Conditional complexity" OR??[edit]

I thought I found a book that calls it this way [1] but it turns out they are citing this Wikipedia page!!! 188.27.81.64 (talk) 22:03, 16 July 2014 (UTC)[reply]

It was a "new name proposal" made via Wikipedia [2]. 188.27.81.64 (talk) 22:35, 16 July 2014 (UTC)[reply]

The very first example is wrong[edit]

"M = E − N + 2P" and "This graph has 9 edges, 8 nodes, and 1 connected component, so the cyclomatic complexity of the program is 9 - 8 + 2*1 = 3."

In reality, the cyclomatic complexity of the first graph is 4 as there are four "linearly independent paths through a program's source code":

1-2-3-4-5-6 / 1-2-3-7-2-3-4-5-6 / 1-2-3-4-8-5-6 / 1-2-3-7-2-3-4-8-5-6

I don't know what is wrong with your formula, but it gives wrong result.

There's no explanation of "linearly independent". I believe it might refer to paths that have at least one edge found in no other path. In your example, the first path "1-2-3-4-5-6" contains no edges that aren't also found in "1-2-3-7-2-3-4-5-6", so the first path isn't linearly independent of the others. --Doradus (talk) 15:04, 11 August 2019 (UTC)[reply]
The example is not wrong. "paths that have at least one edge found in no other path" is not the correct definition of "linearly independent paths", and the correct definition has since been explained on this talk page in a message added 07:38, 3 September 2019 (UTC). The incorrect definition was added to the article on 13:48, 22 January 2020‎, and a second incorrect edit was made on 09:20, 31 March 2021‎, so I intend to remove these changes and possibly replace them with a more accurate definition.
The way I personally like to think of "linearly independent" as a software engineer is representing the path as a vector where each scalar element of the vector is a boolean, and boolean addition and multiplication is used, though that's not necessarily the explanation the article has to use.
The paths 1-2-3-4-5-6 / 1-2-3-7-2-3-4-5-6 / 1-2-3-4-8-5-6 / 1-2-3-7-2-3-4-8-5-6, would be
v1 = [1 1 1 1 1 1 0 0]
v2 = [1 1 1 1 1 1 1 0]
v3 = [1 1 1 1 1 1 0 1]
v4 = [1 1 1 1 1 1 1 1]
v4 is not linearly independent because it can be formed as a linear combination of the other 3.
Keeping in mind that in boolean algebra, 1+1=0
And that 1+1+1 = (1+1)+1 = 0+1 = 1
v1 + v2 + v3 = [(1+1+1) (1+1+1) (1+1+1) (1+1+1) (1+1+1) (1+1+1) (0+1+0) (0+0+1)] = [1 1 1 1 1 1 1 1]
This also shows why "paths that have at least one edge found in no other path" cannot be correct. v2 and v3 both clearly contain every edge in v1, so by that definition there could be at most 2 linearly independent paths.
Worse yet, if v4 is chosen, it already contains every edge, so this definition incorrectly implies the number of linearly independent paths depends on the choice of paths, and led to another incorrect edit on 09:20, 31 March 2021‎ adding the word "maximum" to say "maximum number of linearly independent paths", when in reality there is only one number of linearly independent paths, no minimum or maximum.
--RitchieThai (talk) 9:15, 10 November 2021 (UTC)

What are linearly independent paths?[edit]

This article uses the phrase "linearly independent paths" several times without defining it. What does it mean? --Doradus (talk) 14:56, 11 August 2019 (UTC)[reply]

I agree wholeheartedly. A quick search for "graph linearly independent path" only brings up articles about software testing and this page itself. It would appear that "linearly independent paths" are not a thing. However, once I read past the Definition section, the Explanation in terms of algebraic topology section actually does define them, in three different ways. I think this section might be better called Formalization in mathematical terms, and it should be reworded from "this is another way to look at it" to "these are formal definitions of linearly independent paths".

  1. First the graph theory / linear algebra definition describes how linearly independent paths form a basis for the vector space for the even subgraphs, where the addition operation is symmetric difference of the edge set. This definition is the easiest to comprehend, which is kind of sad.
  2. Next the homology definition, which I admit to not comprehending.
  3. Finally the fundamental group definition, which could use more color. It's cool to think of constructing a homotopy equivalence from a control flow graph to a set of loops, all connected at a single point. Also the n+1 only makes sense if the exit node is not connected to the entry node, which is an essential part of the graph theory definition.

205.189.0.108 (talk) 07:38, 3 September 2019 (UTC)[reply]