It is often useful to know how much it costs to develop a single line of a source code. For example, you need to estimate how much replacing an application with 1M SLOC would cost.
Here I would suggest two methods. Of course, they are not very scientific ones, but good enough for practical purposes.
Method #1. I have come upon a research by Cutter Consortium which suggests that in overall a typical development team including testers, managers delivers 10-15K lines of code (SLOC) per year per member. The best teams can achieve something like 20K lines per year per member. I assume those numbers are for high-level languages like Java or C++.
To use this you need to find an average cost of man-hour in your organization and an average number of hours worked per year. The former must include all expenses and not only a salary (cost of office maintenance etc). Normally each organization has such numbers. If you don’t have this number, use an average developer salary as you know it and “overhead” factor equal e.g. 2.4-3 (use your own judgment, though). With this approach given an average salary $60K and 1850 hours worked per year, you’ll get hourly cost $78-$97.
Let’s assume that the cost of man-hour is $100 and in average an employee works 1850 hours per year. Then the cost of a single line of a code will be between $12.33 and $18.5.
This would mean that for your organization a replacement cost of an application which consists of 1M SLOC would be $12M-$18M, providing your new application is going to have a similar set of features.
Method #2. There are published software replacement cost estimates for known pieces of software and sizes of their code bases. We can use them.
Let’s for example take a look at numbers for Linux kernel. The article says that a re-development cost of Linux kernel 2.6.0 is $612. The kernel consists of 5.9M lines of code. This gives us a cost per line equal to $103.
Another example is a research on Fedora Linux code cost. It gives $52 per line estimate. Also, it contains a lot of other useful info, so you may want to read the document.
Why those estimates are so much different from estimates we got using method #1? I think this is because of complexity e.g. Linux kernel contains a lot of assembler code. But this is just a guess.
So the method is clear: find a replacement cost published for a software technologically similar to what you want to develop and use it. Wikipedia.org is a good place to look for such info.
[Update from 11/10/2011. Software Estimation: Demystifying the Black Art book by Steve McConnell has a chart on p.56 derived from Cocomo II estimation model which plots effort in man-months vs. size of the project in SLOC. Based on the chart one single member of a development team (inculding all - managers, business analysts, quality assurance folks etc) produces around 6K-7K lines of source code per year. This is 3 times less that Cutter Consortium estimate]
There are so many types of code these days ranging from scripts in any one of bash, python, perl, ruby, ect…to high level apps (Java, C++) such as referenced above, to firmware in C of which a lot is very low level – e.g., UART, clock tree setup, etc, to RTL in Verilog or VHDL.
One of the biggest problems I see with trying to put a dollar cost on any given code is that not only does it need to specified as language type and target but quality level and that is an almost impossible metric. To me, good code would be written so that anyone looking at the code could understand it immediately. This means extremely well written comments and well chosen variable names along with well thought out algorithms. The comments should thoroughly explain the algorithm and then point out any nuances or tricks required for performance or circumventing some limitation. I have almost never seen code like this except my own.
Poorly written code costs more to maintain and debug than re-writing the code from scratch. As the number of lines increases, so does the rate at which it can be produced start to fall off. This is probably due to the fact that it becomes ever more difficult to understand and/or trying to add features to an already written section becomes difficult because of the obfuscation, or an inflexible structure or both.
Interesting article! Although the line metric is indeed very susceptible to code language and style, it’s fun to know. I wonder how many books I’ve written in code language by now (: