PNetMark FAQ

Rhys Weatherley, rweather@southern-storm.com.au.
Last Modified: $Date: 2003/02/12 10:48:24 $

Copyright © 2002, 2003 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.

Index

1. What is PNetMark?
2. Where can I get PNetMark?
3. How do I build PNetMark?
4. How do I run PNetMark?
5. What do the benchmarks indicate?
6. Vendor X's engine has a higher score than Vendor Y's. What does that mean?
7. Can I publish PNetMark results?
8. How do I add a new benchmark?
9. What is the theory underlying the benchmarks?

1. What is PNetMark?

PNetMark is a benchmarking tool for Common Language Runtime (CLR) environments. The original version was loosely based on the techniques used by the CaffeineMark to benchmark Java Virtual Machines. Since then, two other standard floating-point benchmarks have been included:
SciMark and Linpack.

The primary purpose of this tool is to identify areas of Portable.NET that may need further optimization. It can be used to compare Portable.NET with other CLR implementations, but some care must be taken (see Question 6).

2. Where can I get PNetMark?

The latest version of PNetMark can always be found at the following site:
http://www.southern-storm.com.au/portable_net.html

3. How do I build PNetMark?

To build and run PNetMark, you will need some kind of C# compiler and Common Language Runtime engine. The build system has been tested with Portable.NET, Mono, Rotor, and the .NET Framework SDK. Under Windows, a GNU-compatible shell environment such as Cygwin is required.

Normally you will unpack and build PNetMark as follows:

$ gunzip pnetmark-VERSION.tar.gz | tar xvf -
$ cd pnetmark-VERSION
$ ./configure
$ make

This assumes that you already have Portable.NET installed on your system. If you did not do a "make install" on Portable.NET, then you can specify the location of the Portable.NET build trees as follows:

$ ./configure --with-pnet=../pnet-0.5.0 --with-pnetlib=../pnetlib-0.5.0

A number of configure options can be supplied to change the C# compiler and Common Language Runtime that are used to build and run the benchmarks:

--with-cscc
Use Portable.NET's C# compiler, cscc.
--with-csc
Use Microsoft's C# compiler, csc.
--with-mcs
Use Mono's C# compiler, mcs.
--with-ilrun
Use Portable.NET's runtime engine, ilrun.
--with-ms
Use Microsoft's commercial runtime engine.
--with-clix
Use Rotor's runtime engine, clix.
--with-mono
Use Mono's JIT-based runtime engine, mono.
--with-mint
Use Mono's interpreter-based runtime engine, mint.

If you supply a compiler option, but not a runtime engine option, then configure will choose the runtime engine that corresponds to the compiler (e.g. --with-cscc and --with-ilrun). The same applies if you supply a runtime engine option, but not a compiler option.

If you supply neither a compiler option nor a runtime engine option, then configure will use the first of the Portable.NET, Mono, or Microsoft tools (in that order), depending upon what is installed on your system.

4. How do I run PNetMark?

Once you have built PNetMark, you can run it from the build tree as follows:

$ make check
PNetMark can also be run manually by invoking the runtime engine directly on the *.exe files. For example:

$ ilrun pnetmark.exe

5. What do the benchmarks indicate?

PNetMark can be used to compare two different versions of a runtime engine, running on the same machine, to determine whether a particular code change makes the engine perform better or worse. Higher numbers mean better performance.

6. Vendor X's engine has a higher score than Vendor Y's. What does that mean?

Using PNetMark to compare dissimilar CLR implementations may give bogus results if not done carefully. Are they running on different machines? Were the binaries compiled with the same compiler and optimization flags? Are they using different implementation technologies? Have the engines been optimised to fool the benchmark? There's a million reasons why two engines will give different results.

Never believe benchmark numbers for engines that you cannot inspect the source code for. Vendors have been known to modify their engines specifically to get high scores on benchmarks. If you cannot see what hacks they've added to the code to lie to the benchmark, then you shouldn't believe the numbers.

For example, some JIT's get a disproportionately large value for loop benchmarks, but that is probably due to the JIT optimising the entire benchmark away, or by aggressively unrolling it. Real world applications don't normally benefit from such optimizations.

A JIT that lies to a benchmark may appear to be faster, but in fact will be slower. Lying JIT's spend extra time checking for optimisations that will not be needed in real applications. This extra checking slows the JIT down, and so real applications run slower than on an honest engine.

You must be careful with comparions between engines from different vendors. Different implementation techniques lead to different trade-offs. Benchmarks don't always accurately measure these trade-offs.

It is sometimes possible for an interpreter to out-perform a JIT if the application is I/O bound, and the interpreter has been optimised for I/O operations. A JIT that can perform fantastic loop optimisations will be useless on such an application.

As a general guide, when comparing the performance of dissimilar CLR's, make sure that you compile the binaries with the same compiler and optimization flags. For example, compile with cscc and then run the same binary with ilrun and mono. You want to measure the optimizations in the engine, not the optimizations in the compiler.

Finally, remember that most applications spend the bulk of their time waiting for the user to press a key or move the mouse, or waiting for a remote process to respond to a request. No amount of fancy optimisations can speed up the user or the remote machine. Benchmarks are a guide to performance, but never an absolute indicator.

7. Can I publish PNetMark results?

If you like. But keep in mind the issues discussed in Question 6 above. The numbers are only meaningful to compare different versions of the same CLR running on the same machine.

You may be tempted to run PNetMark against the Microsoft CLR. If you do, you cannot tell the author of the benchmark, or anyone else for that matter, what the results are. The following is an excerpt from Microsoft's End User License Agreement (EULA) for their .NET Framework SDK:

Performance or Benchmark Testing. You may not disclose the results of any benchmark test of either the Server Software or Client Software to any third party without Microsoft's prior written approval.
Thus, you can run the benchmark if you like, but you must keep the results to yourself.

8. How do I add a new benchmark?

There are two ways to add a new benchmark: a new application binary, or a modification to pnetmark.exe. The first is the recommended approach, because calibrating pnetmark.exe properly can be quite difficult.

New benchmarks in pnetmark.exe are provided by classes that implement the "IBenchmark" interface. The following methods and properties must be supplied:

Initialize
Initialize the benchmark. This is called once per benchmark.
Name
Name of the benchmark, for reporting purposes.
Magnification
An integer value that is used to scale the number of runs to produce a usable score value.
Run
Run the benchmark once. This will be called repeatedly until the caller is satisified that it has collected sufficient information.
CleanUp
Clean up any temporary data that was used by the benchmark. This is called once per benchmark.
You also need to add a line to "PNetMark.Run" to run the benchmark when requested. e.g.
Run(new FooBenchmark());
Finally, add the name of your new source files to "Makefile.am" and rebuild.

9. What is the theory underlying the benchmarks?

It can be very difficult to build a benchmark that accurately measures what you are trying to inspect, while factoring out unimportant environmental effects.

We are deliberately avoiding this question of benchmark "validity". Instead, we have tried to faithfully reproduce well-known benchmarks that have been used elsewhere and are generally well understood (the one exception is the use of C#'s 2D array facility in the "Float" benchmark - the CaffeineMark uses nested 1D arrays).

For example, the SciMark and Linpack benchmarks are a direct port to C# of the Java versions. Other than syntactic differences, the code is identical. Questions as to benchmarking theory are best sent to the original authors of those benchmarks.

You are welcome to submit patches if you detect a discrepancy in our porting efforts.


Copyright © 2002, 2003 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.