Simian version runs under any Java2 1.4 or higher Java Virtual Machine (JVM) and any Dot Net 1.1 or higher environment, meaning Simian can be run on anything from windows, macOS and linux to zOS.
The distribution contains everything you need to be up and running in minutes:
Aslak Hellesoy has kindly donated a Maven plugin.
Neil Bartlett has kindly donated an Eclipse plugin.
Simian fully supports the following languges:
with partial support for the following languages:
If the file is not of a supported type, it is treated as plain text. This means that you can usually run Simian on just about any type of human-readable file with good results.
Ignores whitespace, curly braces, comments, imports, includes, package declarations, etc.
Supports the following processing options:
option | languages | default | possible values | description |
---|---|---|---|---|
formatter | all | none | plain, xml, emacs, vs (visual studio), yaml | Specifies the format in which processing results will be produced. |
threshold | all | 6 | integer >= 2 | Matches will contain at least the specified number of lines. |
language | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml | Assumes all files are in the specified language |
failOnDuplication | all | true | boolean | Causes the checker to fail the current process if duplication is detected |
reportDuplicateText | all | false | boolean | Prints the duplicate text in reports |
ignoreBlocks | all | none | string | Ignores all lines between specified START/END markers |
ignoreCurlyBraces | Java, C#, C, C++, JavaScript, Ruby | false | boolean | Curly braces are ignored. |
ignoreIdentifiers | Java, C#, C, C++, JavaScript, COBOL, Ruby | false | boolean | Completely ignores all identfiers. |
ignoreIdentifierCase | Java, C#, C, C++, JavaScript, COBOL, Ruby | true | boolean | Matches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match. |
ignoreRegions | C# | false | boolean | Ignore lines between #region/#endregion. |
ignoreStrings | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL | false | boolean | MyVariable and myvariablewould both match. |
ignoreStringCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL | true | boolean | "Hello, World" and "HELLO, WORLD" would both match. |
ignoreNumbers | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL | false | boolean | int x = 1; and int x = 576; would both match. |
ignoreCharacters | Java, C#, C, C++, JavaScript, COBOL, Ruby | false | boolean | 'A' and 'Z'would both match. |
ignoreCharacterCase | Java, C#, C, C++, JavaScript, COBOL, Ruby | true | boolean | 'A' and 'a'would both match. |
ignoreLiterals | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL | false | boolean | 'A', "one" and 27.8would all match. |
ignoreSubtypeNames | Java, C | false | boolean | BufferedReader, StringReader and Reader would all match. |
ignoreModifiers | Java, C#, C, C++, JavaScript | true | boolean | public, protected, static, etc. |
ignoreVariableNames | Java, C | false | boolean | Completely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match |
balanceParentheses | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL | false | boolean | Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one. |
balanceCurlyBraces | Ruby | false | boolean | Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one. |
balanceSquareBrackets | Java, C#, C, C++, JavaScript, Ruby | false | boolean | Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false. |
Recognises the following file extensions/language options:
language | extensions |
---|---|
java | java |
c sharp | cs, c#, csharp |
c | c, h, m |
cpp | cpp, c++, hpp, cplusplus |
ruby | rb, ruby |
cobol | cobol |
abap | abap |
xml | xml, xsl, xsd |
jsp | jsp |
asp | asp |
javascript | js, javascript |
html | html, htm |
vb | vb, bas, cls, frm |
lisp | lisp, lsp |
text | this is the default when no appropriate language can be determined |
Here is an example of the standard output produced by Simian (version 2.0.3) when run against the JDK 1.4.2_03 source code:
Similarity Analyser 2.1.2 - http://www.redhillconsulting.com.au/products/simian/index.html Copyright (c) 2003-04 RedHill Consulting, Pty. Ltd. All rights reserved. Simian is not free unless used solely for non-commercial or evaluation purposes. {ignoreCurlyBraces=true, ignoreModifiers=true, ignoreStringCase=true, threshold=9} Loading (recursively) *.java from /var/tmp/jdksrc Found 9 duplicate lines in the following files: Between lines 65 and 76 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicSliderUI.java Between lines 71 and 82 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthSliderUI.java Found 9 duplicate lines in the following files: Between lines 37 and 49 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifCheckBoxMenuItemUI.java Between lines 43 and 55 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifRadioButtonMenuItemUI.java Between lines 36 and 48 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifMenuItemUI.java Found 9 duplicate lines in the following files: Between lines 391 and 435 in /var/tmp/jdksrc/org/apache/xml/dtm/ref/DTMDocumentImpl.java Between lines 1533 and 1577 in /var/tmp/jdksrc/org/apache/xml/dtm/ref/dom2dtm/DOM2DTM.java Found 9 duplicate lines in the following files: Between lines 1744 and 1758 in /var/tmp/jdksrc/javax/swing/plaf/metal/MetalFileChooserUI.java Between lines 1995 and 2009 in /var/tmp/jdksrc/com/sun/java/swing/plaf/windows/WindowsFileChooserUI.java Between lines 849 and 863 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/GTKFileChooserUI.java Found 9 duplicate lines in the following files: Between lines 47 and 59 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicMenuBarUI.java Between lines 55 and 67 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthMenuBarUI.java ... Found 285 duplicate lines in the following files: Between lines 42 and 599 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicTableUI.java Between lines 43 and 600 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthTableUI.java Found 285 duplicate lines in the following files: Between lines 471 and 1123 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicComboPopup.java Between lines 468 and 1120 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthComboPopup.java Found 334 duplicate lines in the following files: Between lines 1950 and 2461 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthTabbedPaneUI.java Between lines 2199 and 2710 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicTabbedPaneUI.java Found 384 duplicate lines in the following files: Between lines 739 and 1660 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthListUI.java Between lines 710 and 1631 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicListUI.java Found 435 duplicate lines in the following files: Between lines 84 and 545 in /var/tmp/jdksrc/org/apache/xalan/res/XSLTErrorResources_ko.java Between lines 121 and 579 in /var/tmp/jdksrc/org/apache/xalan/res/XSLTErrorResources.java Found 68412 duplicate lines in 3143 blocks in 953 files Processed a total of 414712 significant (1295861 raw) lines in 4136 files Processing time: 24.916sec
To see the full results* for the JDK 1.4.2_03 source code, download either the 350k plain text or a 43k compressed version.
* Results may vary depending on factors such as hardware used, number of duplicate lines, etc.
Java and all Java-based marks are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States and other countries.
.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and
other countries.
Copyright (c) 2003-07 RedHill Consulting Pty. Ltd. All rights reserved.