But it can be hard to find, especially in a large project. So we wrote a utility - CPD - to find it for us. It's been through three major incarnations:
Each rewrite made it much faster, and now it can process the JDK java.* packages in about 4 seconds (on my Linux workstation, at least).
Here's a screenshot of CPD after running on the JDK java.lang package.
Note that CPD works with Java, C, C++, and PHP code.
If you have Java Web Start, you can run CPD by clicking here.
Here are the duplicates CPD found in the JDK 1.4 source code.
Here are the duplicates CPD found in the APACHE_2_0_BRANCH branch of Apache
(just the httpd-2.0/server/
directory).
Andy Glover wrote an Ant task for CPD; here's how to use it:
<target name="cpd"> <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask" /> <cpd minimumTokenCount="100" outputFile="/home/tom/cpd.txt"> <fileset dir="/home/tom/tmp/ant"> <include name="**/*.java"/> </fileset> </cpd> </target>
Also, you can get verbose output from this task by running ant with the -v
flag; i.e., ant -v -f mybuildfile.xml cpd
.
There's also a JavaSpaces version available for splitting the CPD effort across a farm of machines. I usually post news on that here and the releases are here. This project is pretty much dead, though, since the current code is fast enough to just run it on one machine.
Suggestions? Comments? Post them here. Thanks!