Coral Blocks » CoralBits

Java Development without GC

cb — Sat, 03 May 2014 03:26:10 +0000

All products developed by Coral Blocks have the very important feature of leaving ZERO garbage behind. Because the latency imposed by the Java Garbage Collector (i.e. GC) is unacceptable for high-performance systems and because it is impossible to turn off the GC, the best option for real-time systems in Java is to not produce any garbage at all so that the GC never kicks in. Imagine a high-performance matching engine operating in the microsecond level, sending and receiving hundreds of thousands messages per second. If at any given time the GC decides to kick in with its 1+ millisecond latencies, the disruption in the system will be huge. Therefore, if you want to develop real-time systems in Java with minimal variance and latency, the best option is to do it right without creating any garbage for the GC. In this article we will discuss best practices and how you can use Coral Blocks’ MemorySampler utility class to help you accomplish this critical goal.

You should also check the YouTube video below where we talk in detail about how to do garbage-free programming in Java.

The MemorySampler utility class

Garbage or dereferenced Java instances can show up in three different places: third-party libraries, JDK classes and your code. Before we discuss each one of them, let’s take a break to introduce Coral Blocks’ MemorySampler utility class which will allow you not just to tell your boss you are producing no trash but to prove it. (Note: MemorySampler is part of the CoralBits component)

		MemorySampler.start();
		// do nothing!
		MemorySampler.end();
		MemorySampler.printSituation();
		
		/* == OUTPUT:
		 Memory allocated on last pass: 0
		 Memory allocated total: 0
		 */
		
		MemorySampler.start();
		// allocate nothing...
		int x = 10;
		MemorySampler.end();
		MemorySampler.printSituation();
		
		/* == OUTPUT:
		 Memory allocated on last pass: 0
		 Memory allocated total: 0
		 */
		
		MemorySampler.start();
		// allocate an object...
		String s = new String("trash");
		MemorySampler.end();
		MemorySampler.printSituation();
		
		/* == OUTPUT:
		 Memory allocated on last pass: 24
		 Memory allocated total: 24

		 Stack Trace for java.lang.String
    		com.coralblocks.coralutils.gcutils.Basics.test1(Basics.java:20)
    		com.coralblocks.coralutils.gcutils.Basics.main(Basics.java:33)
		 */
		
		MemorySampler.start();
		// allocate 10 objects...
		for(int i = 0; i < 10; i++) new String("trash!");
		MemorySampler.end();
		MemorySampler.printSituation();
		
		/* == OUTPUT:
		 Memory allocated on last pass: 240
	 	 Memory allocated total: 264

		 Stack Trace for java.lang.String
    		com.coralblocks.coralutils.gcutils.Basics.test1(Basics.java:26)
    		com.coralblocks.coralutils.gcutils.Basics.main(Basics.java:33)
		 */

		Map map = new HashMap();
		String key = "key";
		String value = "value";
		
		MemorySampler.start();
		map.put(key, value);
		MemorySampler.end();
		MemorySampler.printSituation();
		
		/* == OUTPUT:
		 Memory allocated on last pass: 112
		 Memory allocated total: 376

		 Stack Trace for [Ljava.util.HashMap$Entry;
    		java.util.HashMap.inflateTable(HashMap.java:320)
    		java.util.HashMap.put(HashMap.java:492)
    		com.coralblocks.coralutils.gcutils.Basics.test1(Basics.java:69)
    		com.coralblocks.coralutils.gcutils.Basics.main(Basics.java:76)

		 Stack Trace for java.util.HashMap$Entry
    		java.util.HashMap.createEntry(HashMap.java:901)
    		java.util.HashMap.addEntry(HashMap.java:888)
    		java.util.HashMap.put(HashMap.java:509)
    		com.coralblocks.coralutils.gcutils.Basics.test1(Basics.java:69)
    		com.coralblocks.coralutils.gcutils.Basics.main(Basics.java:76)
		 */

As you can see by the output above, MemorySampler can tell you some important things:

The amount of memory allocated on the last pass
The total memory allocated so far
Who allocated the memory in the last pass with the source code line number
The stack trace leading to the allocation call

Of course the fact that the code is allocating memory does not necessarily mean it is creating garbage as references can be pooled for re-use. For example the code below:

		Map map = new HashMap();
		
		for(int i = 0; i < 100; i++) {

			MemorySampler.start();
			
			map.put("key", "value");
			map.remove("key");
			
			MemorySampler.end();
			if (MemorySampler.wasMemoryAllocated(true)) { // true => ignore the first pass (init)
				MemorySampler.printSituation();
			}
		}

Prints the output below 99 times:

         Stack Trace for java.util.HashMap$Entry
             java.util.HashMap.createEntry(HashMap.java:901)
             java.util.HashMap.addEntry(HashMap.java:888)
             java.util.HashMap.put(HashMap.java:509)
             com.coralblocks.coralutils.gcutils.Basics.test2(Basics.java:104)
             com.coralblocks.coralutils.gcutils.Basics.main(Basics.java:116)

The total memory allocated is incrementing linearly with iterations, in other words, the memory allocated per pass is always 32 and the total memory allocated increases from 144 to 3280:

Memory allocated on last pass: 32
Memory allocated total: 144

(...)

Memory allocated on last pass: 32
Memory allocated total: 3184

At this point it is clear that java.util.HashMap.createEntry is not pooling its objects, it is creating garbage and it is just a matter of enough iterations before it triggers the GC. We will soon see how to fix that.

Warming up, Checking the GC and Sampling

The key to make sure your system is not creating any garbage is to warm up your critical path from start to finish a couple of million times and then check for memory allocation another couple of million times. If it is allocating memory linearly as the number of iterations increases, it is most likely creating garbage and you should use the stack trace provided by MemorySampler to investigate it further. It might sound more complicated than it really is as it will often be straightforward to verify that as you iterate doing the same thing, the same object is allocated over and over again indicating a garbage leak, as it was the case with the java.util.HashMap.

Real-time applications usually have a critical loop that is executed non-stop by a high-priority thread. You can plug the MemorySampler in that loop to embody your whole application. For example, CoralReactor, which is a high-performance asynchronous network library, has the following code at the top of its critical selector (i.e. reactor) thread:

			while (isRunning) {
				
				if (traceAllocation) {
					
    				MemorySampler.end();
    				
    				if (MemorySampler.wasMemoryAllocated()) {

    					MemorySampler.printSituation();
    				}
    				
    				MemorySampler.start();
				}

				// that's the critical path, in other words, all branches of your application
				// can be reached from this point

				// (...)
			}

When using CoralReactor or any other application with a MemorySampler configured in the critical path as above, the standard procedure to detect garbage creation or Java garbage leaks is:

Turn off the MemorySampler and run your application with -verbose:gc
Send a couple of million messages or exercise your code the same way a couple of million times and check if the GC kicks in.
If it doesn’t you most likely do not have a garbage leak but it does not hurt to turn on the MemorySampler and check for any memory allocation.
If it does then you must turn on the MemorySampler and investigate to see who is the culprit.

A good system will have the MemorySampler on and not allocate anything after warming up. At this point you can be sure that at least the most important branches of your application that you warmed up are free from garbage leaks and you can execute them a billion times without any GC overhead.

Getting rid of the trash

Once you have determined that your system has a garbage leak, you have to fix it. As we said in the beginning, there are three scenarios where you can find a garbage leak:

In a third-party library

If you are using a third-party library that is producing a lot of garbage you should consider writing your own, contacting the author or the company or fixing the code yourself if it is an open-source project. If that cannot be done, then you should start looking for a better alternative or another way to perform the same task in a more efficient way. Some libraries claim they are real-time libraries that produce no garbage. You should favor these ones and of course test them to see if they are really leaving no trash behind.

In the JDK classes

Please contact us for more information on how to accomplish that.

In your own code

When coding your own lines of code you must have the discipline and right libraries and tools not to leave a mess for the GC to clean. For example, you should know that autoboxing produces garbage as well as varargs. Some JDK data structures like java.util.HashMap produce garbage as we saw in an earlier example and are not very efficient to store primitive keys. You must have your own set of high-performance, clean and fast data-structures if you want to do real-time Java development efficiently. You can find open-source real-time libraries or make the ones that are not real-time better by patching their code.

Another important tool to have in hand is a good, clean and fast object pool. By making your objects mutable and pooling them for re-use you can eliminate most of the garbage leaks in Java. For example, it is not difficult to pool the entry objects in the java.util.HashMap to make it garbage-free.

Conclusion

Java Development without GC overhead is very possible and you can use a memory sampler to make sure your application is not leaving any garbage behind. By using real-time libraries you can build a high-performance system from the ground up with minimal variance and latency. If you are in a hurry to build your real-time codebase foundation, you can count on Coral Blocks to help you. We have all the libraries and tools to build any real-time ultra-low-latency system from the ground up. You can use one of our components and we can provide you with our utility classes and data structures that will not just simplify your applications but leave them shiny and clean of garbage. They will never see the GC again.

Jitter: A C++ and Java Comparison

cb — Wed, 26 Aug 2015 18:22:24 +0000

In this article we write two equivalent programs in C++ and in Java that perform the same mathematical calculations in a loop and proceed to measure their jitters.

As you can see from the results below, the main source of jitter is the OS itself, not the choice between C++ and Java. C++ exhibits a jitter compatible with Java, in other words, the JVM is not introducing variance on top of the OS jitter. That’s the case for Java programs that produce zero garbage (no GC jitter) and are properly warmed up (no JIT jitter).

Note: We used the same isolated cpu core for all tests through thread pinning.

Java Version

java version "17.0.1" 2021-10-19 LTS
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)

Java Jitter

Iterations: 9,000,000
Avg Time: 45.64 nanos
StDev: 41.98 nanos
Min Time: 17 nanos
Max Time: 23531 nanos
75% (6,750,000) = [avg: 38 , stdev: 19.85 , max: 67] - 25% (2,250,000) = [avg: 68 , stdev: 72.02 , min: 67]
90% (8,100,000) = [avg: 43 , stdev: 21.2 , max: 68] - 10% (900,000) = [avg: 69 , stdev: 113.86 , min: 68]
99% (8,910,000) = [avg: 45 , stdev: 21.45, max: 68] - 1% (90,000) = [avg: 79 , stdev: 359.88 , min: 68]
99.9% (8,991,000) = [avg: 45 , stdev: 21.47 , max: 70] - 0.1% (9,000) = [avg: 179 , stdev: 1133.2 , min: 70]
99.99% (8,999,100) = [avg: 45 , stdev: 21.47 , max: 77] - 0.01% (900) = [avg: 1113, stdev: 3445.62 , min: 77]
99.999% (8,999,910) = [avg: 45 , stdev: 21.69 , max: 911] - 0.001% (90) = [avg: 9340, stdev: 6545.94 , min: 911 ]
99.9999% (8,999,991) = [avg: 45 , stdev: 38.11 , max: 15185] - 0.0001% (9) = [avg: 17427, stdev: 2851.13 , min: 15197]
99.99999% (8,999,999) = [avg: 45 , stdev: 41.24 , max: 20875] - 0.00001% (1) = [avg: 23531, stdev: 0.0 , min: 23531]

C++ Jitter (-O3)

Iterations: 9,000,000
Avg Time: 205 nanos
Stdev: 50.38 nanos
Min Time: 203 nanos
Max Time: 23656 nanos
75% (6,750,000) = [avg: 205, stdev: 0.55, max: 206] - 25% (2,250,000) = [avg: 207, stdev: 100.74, min: 206]
90% (8,100,000) = [avg: 205, stdev: 0.56, max: 206] - 10% (900,000) = [avg: 208, stdev: 159.27, min: 206]
99% (8,910,000) = [avg: 205, stdev: 0.65, max: 207] - 1% (90,000) = [avg: 229, stdev: 503.18, min: 207]
99.9% (8,991,000) = [avg: 205, stdev: 0.68, max: 210] - 0.1% (9000) = [avg: 426, stdev: 1577.60, min: 210]
99.99% (8,999,100) = [avg: 205, stdev: 0.69, max: 215] - 0.01% (900) = [avg: 2364, stdev: 4550.97, min: 215]
99.999% (8,999,910) = [avg: 205, stdev: 18.32, max: 13111] - 0.001% (90) = [avg: 14923, stdev: 1900.00, min: 13120]
99.9999% (8,999,991) = [avg: 205, stdev: 46.53, max: 16273] - 0.0001% (9) = [avg: 19409, stdev: 1918.96, min: 17846]
99.99999% (8,999,999) = [avg: 205, stdev: 49.77, max: 21292] - 0.00001% (1) = [avg: 23656, stdev: 0.00, min: 23656]

Java Source Code

package com.coralblocks.coralthreads.sample; import com.coralblocks.coralbits.bench.Benchmarker; import com.coralblocks.coralbits.util.SystemUtils; import com.coralblocks.coralthreads.Affinity; public class TestJitter { // To execute: java -server -verbose:gc -cp coralthreads-all.jar -DbenchWorstPercs=true -DbenchTotals=true -DbenchStdev=true -DbenchMorePercs=true -DdetailedBenchmarker=true -DprocToBind=1 -DexcludeNanoTimeCost=true com.coralblocks.coralthreads.sample.TestJitter 10000000 1000000 1000 public static void main(String[] args) { int iterations = Integer.parseInt(args[0]); int warmup = Integer.parseInt(args[1]); int load = Integer.parseInt(args[2]); int procToBind = SystemUtils.getInt("procToBind", -1); if (procToBind != -1) { Affinity.set(procToBind); } Benchmarker bench = Benchmarker.create(warmup); long x = 0; for(int i = 0; i < iterations; i++) { bench.mark(); x += doSomething(load ,i); bench.measure(); } System.out.println("Value computed: " + x); bench.printResults(); } /* * For speed, it is important to extract the hot code (i.e. the code executed in a loop) to its own method so the JIT can inline/optimize/compile. * * Note that the main() method above is executed only once. */ private final static long doSomething(int load, int i) { long x = 0; for(int j = 0; j < load; j++) { long pow = (i % 8) * (i % 16); if (i % 2 == 0) { x += pow; } else { x -= pow; } } return x; } }

C++ Source Code

#include #include #include #include #include #include #include #include #include #include #include using namespace std; // TO COMPILE: g++ TestJitter.cpp -o TestJitter -std=c++11 -O3 // TO EXECUTE: ./TestJitter 10000000 1000000 1000 1 static const bool MORE_PERCS = true; static const bool INCLUDE_WORST_PERCS = true; static const bool INCLUDE_TOTALS = true; static const bool INCLUDE_RATIOS = false; static const bool INCLUDE_STDEV = true; static const bool EXCLUDE_NANO_TS_COST = true; long get_nano_ts(timespec* ts) { clock_gettime(CLOCK_MONOTONIC, ts); return ts->tv_sec * 1000000000 + ts->tv_nsec; } static const long NANO_COST_ITERATIONS = 10000000; static long calc_nano_ts_cost() { struct timespec ts; long start = get_nano_ts(&ts); long finish = start; for (long i = 0; i < NANO_COST_ITERATIONS; i++) { finish = get_nano_ts(&ts); } finish = get_nano_ts(&ts); return (finish - start) / NANO_COST_ITERATIONS; } struct mi { long value; }; void add_perc(stringstream& ss, int size, double perc, map* map) { if (map->empty()) return; int max = -1; int minBottom = -1; long x = round(perc * size); long i = 0; long iBottom = 0; long sum = 0; long sumBottom = 0; bool trueForTopFalseForBottom = true; bool flag = false; const int arraySize = 1024 * 1024 * 10; int* tempData = new int[arraySize]; double stdevTop = -1; for(auto iter = map->begin(); iter != map->end(); iter++) { if (flag) break; int time = iter->first; long count = (iter->second)->value; for(int a = 0; a < count; a++) { if (trueForTopFalseForBottom) { tempData[i] = time; i++; sum += time; if (i == x) { max = time; if (INCLUDE_STDEV) { double avg = (double) sum / (double) i; double temp = 0; for(int b = 0; b < i; b++) { int t = tempData[b]; temp += (avg - t) * (avg - t); } stdevTop = sqrt(((double) temp / (double) i)); } if (INCLUDE_WORST_PERCS) { trueForTopFalseForBottom = false; } else { flag = true; break; } } } else { tempData[iBottom] = time; iBottom++; sumBottom += time; if (minBottom == -1) { minBottom = time; } } } } ss << " | " << fixed << setprecision(5) << (perc * 100) << "%"; if (INCLUDE_TOTALS) ss << " (" << i << ")"; ss << " = [avg: " << (sum / i); if (INCLUDE_STDEV) ss << ", stdev: " << fixed << setprecision(2) << stdevTop; ss << ", max: " << max << "]"; if (INCLUDE_WORST_PERCS) { ss << " - " << fixed << setprecision(5) << ((1 - perc) * 100) << "%"; if (INCLUDE_TOTALS) ss << " (" << (iBottom > 0 ? iBottom : 0) << ")"; ss << " = [avg: " << (iBottom > 0 ? (sumBottom / iBottom) : -1); if (INCLUDE_STDEV) { ss << ", stdev: "; if (iBottom <= 0) { ss << "?"; } else { double avgBottom = (sumBottom / iBottom); double temp = 0; for(int b = 0; b < iBottom; b++) { long t = tempData[b]; temp += (avgBottom - t) * (avgBottom - t); } double stdevBottom = sqrt((double) temp / (double) iBottom); ss << fixed << setprecision(2) << stdevBottom; } } ss << ", min: " << (minBottom != -1 ? minBottom : -1) << "]"; if (INCLUDE_RATIOS) { ss << " R: "; ss << fixed << setprecision(2) << (iBottom > 0 ? (((sumBottom / iBottom) / (double) (sum / i)) - 1) * 100 : -1); ss << "%"; } } delete[] tempData; } int main(int argc, char* argv[]) { int iterations = stoi(argv[1]); int warmup = stoi(argv[2]); int load = stoi(argv[3]); int proc = stoi(argv[4]); cpu_set_t my_set; CPU_ZERO(&my_set); CPU_SET(proc, &my_set); sched_setaffinity(0, sizeof(cpu_set_t), &my_set); long nanoTimeCost = EXCLUDE_NANO_TS_COST ? calc_nano_ts_cost() : 0; struct timespec ts; long long x = 0; long long totalTime = 0; int minTime = numeric_limits::max(); int maxTime = numeric_limits::min(); map* results = new map(); for(int i = 0; i < iterations; i++) { long start = get_nano_ts(&ts); for(int j = 0; j < load; j++) { long p = (i % 8) * (i % 16); if (i % 2 == 0) { x += p; } else { x -= p; } asm(""); // so that the loop is not removed by -O3 } long end = get_nano_ts(&ts); int res = end - start - nanoTimeCost; if (res <= 0) res = 1; if (i >= warmup) { totalTime += res; minTime = min(minTime, res); maxTime = max(maxTime, res); auto iter = results->find(res); if (iter != results->end()) { (iter->second)->value = (iter->second)->value + 1; } else { mi* elem = new mi(); elem->value = 1; (*results)[res] = elem; } } } int count = iterations - warmup; double avg = totalTime / count; cout << "Value computed: " << x << endl; cout << "Nano timestamp cost: " << nanoTimeCost << endl; stringstream ss; ss << "Iterations: " << count << " | Avg Time: " << avg; if (INCLUDE_STDEV) { long temp = 0; long x = 0; for(auto iter = results->begin(); iter != results->end(); iter++) { int time = iter->first; long count = (iter->second)->value; for(int a = 0; a < count; a++) { temp += (avg - time) * (avg - time); x++; } } double stdev = sqrt( temp / x ); ss << " | Stdev: " << fixed << setprecision(2) << stdev; } if (count > 0) { ss << " | Min Time: " << minTime << " | Max Time: " << maxTime; } add_perc(ss, count, 0.75, results); add_perc(ss, count, 0.90, results); add_perc(ss, count, 0.99, results); add_perc(ss, count, 0.999, results); add_perc(ss, count, 0.9999, results); add_perc(ss, count, 0.99999, results); if (MORE_PERCS) { add_perc(ss, count, 0.999999, results); add_perc(ss, count, 0.9999999, results); } cout << ss.str() << endl << endl; delete results; return 0; }

Performance Analysis: comparing C++ and Java

cb — Thu, 23 Jun 2022 19:40:12 +0000

In this article we write two equivalent programs in C++ and in Java, in exactly the same way to do exactly the same thing: the (in)famous bubble sort algorithm. Then we proceed to measure the latency. On this experiment, Java was faster than C++ even with the -O3 compiler option.

Note: We used the same isolated cpu core for all tests through thread pinning.

Java Version

java version "17.0.1" 2021-10-19 LTS Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39) Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)

Java

Iterations: 9,000,000 Avg Time: 825.97 nanos StDev: 156.4 nanos Min Time: 781 nanos Max Time: 107335 nanos 75% (6,750,000) = [avg: 801, stdev: 5.2, max: 876] - 25% (2,250,000) = [avg: 899, stdev: 301.03, min: 876] 90% (8,100,000) = [avg: 816, stdev: 32.78, max: 893] - 10% (900,000) = [avg: 915, stdev: 475.5, min: 893] 99% (8,910,000) = [avg: 823, stdev: 39.0, max: 904] - 1% (90,000) = [avg: 1077, stdev: 1493.9, min: 904] 99.9% (8,991,000) = [avg: 824, stdev: 39.62, max: 915] - 0.1% (9,000) = [avg: 2610, stdev: 4439.05, min: 915] 99.99% (8,999,100) = [avg: 824, stdev: 47.73, max: 13448] - 0.01% (900) = [avg: 15264, stdev: 3653.04, min: 13484] 99.999% (8,999,910) = [avg: 825, stdev: 141.2, max: 16186] - 0.001% (90) = [avg: 19503, stdev: 10169.29, min: 16187] 99.9999% (8,999,991) = [avg: 825, stdev: 149.83, max: 25498] - 0.0001% (9) = [avg: 38328, stdev: 24606.03, min: 25787] 99.99999% (8,999,999) = [avg: 825, stdev: 152.32, max: 34315] - 0.00001% (1) = [avg: 107335, stdev: 0.0, min: 107335]

C++ (-O3)

Iterations: 9,000,000 Avg Time: 1229 nanos Stdev: 157.94 nanos Min Time: 1141 nanos Max Time: 35318 nanos 75% (6,750,000) = [avg: 1183, stdev: 7.59, max: 1197] - 25% (2,250,000) = [avg: 1364, stdev: 273.80, min: 1197] 90% (8,100,000) = [avg: 1203, stdev: 63.25, max: 1431] - 10% (900,000) = [avg: 1458, stdev: 393.68, min: 1431] 99% (8,910,000) = [avg: 1225, stdev: 91.58, max: 1462] - 1% (90,000) = [avg: 1595, stdev: 1236.27, min: 1462] 99.9% (8,991,000) = [avg: 1227, stdev: 94.04, max: 1484] - 0.1% (9,000) = [avg: 2732, stdev: 3721.22, min: 1484] 99.99% (8,999,100) = [avg: 1227, stdev: 95.25, max: 2305] - 0.01% (900) = [avg: 12314, stdev: 5985.26, min: 2305] 99.999% (8,999,910) = [avg: 1228, stdev: 146.17, max: 16629] - 0.001% (90) = [avg: 19763, stdev: 3783.30, min: 16631] 99.9999% (8,999,991) = [avg: 1229, stdev: 155.39, max: 23087] - 0.0001% (9) = [avg: 29166, stdev: 4156.72, min: 24677] 99.99999% (8,999,999) = [avg: 1229, stdev: 157.53, max: 34189] - 0.00001% (1) = [avg: 35318, stdev: 0.00, min: 35318]

Java Source Code

package com.coralblocks.coralthreads.sample; import java.util.Arrays; import com.coralblocks.coralbits.bench.Benchmarker; import com.coralblocks.coralbits.util.OSUtils; import com.coralblocks.coralbits.util.SystemUtils; import com.coralblocks.coralthreads.Affinity; public class TestPerformance { // java -server -verbose:gc -cp ./target/classes:./target/coralthreads-all.jar:coralthreads-all.jar -DcoralThreadsVerbose=false -DbenchWorstPercs=true -DbenchTotals=true -DbenchStdev=true -DbenchMorePercs=true -DdetailedBenchmarker=true -DprocToBind=1 -DexcludeNanoTimeCost=true com.coralblocks.coralthreads.sample.TestPerformance 10000000 1000000 60 private static int[] HEAP_ARRAY; public static void main(String[] args) { int iterations = Integer.parseInt(args[0]); int warmup = Integer.parseInt(args[1]); int arraySize = Integer.parseInt(args[2]); int procToBind = SystemUtils.getInt("procToBind", -1); if (procToBind != -1 && OSUtils.isLinux()) { Affinity.set(procToBind); } HEAP_ARRAY = new int[arraySize]; Benchmarker bench = Benchmarker.create(warmup); long x = 0; for(int i = 0; i < iterations; i++) { bench.mark(); doSomething(HEAP_ARRAY, HEAP_ARRAY.length); bench.measure(); for(int j = 0; j < HEAP_ARRAY.length; j++) { x += HEAP_ARRAY[j]; } } System.out.println("Value computed: " + x); System.out.println("Array: " + Arrays.toString(HEAP_ARRAY)); bench.printResults(); } private static void swapping(int[] array, int x, int y) { int temp = array[x]; array[x] = array[y]; array[y] = temp; } private static void bubbleSort(int[] array, int size) { for(int i = 0; i < size; i++) { int swaps = 0; // flag to detect any swap is there or not for(int j = 0; j < size - i - 1; j++) { if (array[j] > array[j + 1]) { // when the current item is bigger than next swapping(array, j, j + 1); swaps = 1; } } if (swaps == 0) break; // No swap in this pass, so array is sorted } } /* * For speed, it is important to extract the hot code (i.e. the code executed in a loop) to its own method so the JIT can inline/optimize/compile. * * Note that the main() method above is executed only once. */ private final static void doSomething(int[] array, int size) { for(int z = 0; z < size; z++) { array[z] = size - z; } bubbleSort(array, size); } }

C++ Source Code

#include #include #include #include #include #include #include #include #include #include #include using namespace std; // TO COMPILE: g++ TestPerformance.cpp -o TestPerformance -std=c++11 -O3 // TO EXECUTE: ./TestPerformance 10000000 1000000 60 1 static const bool MORE_PERCS = true; static const bool INCLUDE_WORST_PERCS = true; static const bool INCLUDE_TOTALS = true; static const bool INCLUDE_RATIOS = false; static const bool INCLUDE_STDEV = true; static const bool EXCLUDE_NANO_TS_COST = true; long get_nano_ts(timespec* ts) { clock_gettime(CLOCK_MONOTONIC, ts); return ts->tv_sec * 1000000000 + ts->tv_nsec; } static const long NANO_COST_ITERATIONS = 10000000; static long calc_nano_ts_cost() { struct timespec ts; long start = get_nano_ts(&ts); long finish = start; for (long i = 0; i < NANO_COST_ITERATIONS; i++) { finish = get_nano_ts(&ts); } finish = get_nano_ts(&ts); return (finish - start) / NANO_COST_ITERATIONS; } struct mi { long value; }; void add_perc(stringstream& ss, int size, double perc, map* map) { if (map->empty()) return; int max = -1; int minBottom = -1; long x = round(perc * size); long i = 0; long iBottom = 0; long sum = 0; long sumBottom = 0; bool trueForTopFalseForBottom = true; bool flag = false; const int arraySize = 1024 * 1024 * 10; int* tempData = new int[arraySize]; double stdevTop = -1; for(auto iter = map->begin(); iter != map->end(); iter++) { if (flag) break; int time = iter->first; long count = (iter->second)->value; for(int a = 0; a < count; a++) { if (trueForTopFalseForBottom) { tempData[i] = time; i++; sum += time; if (i == x) { max = time; if (INCLUDE_STDEV) { double avg = (double) sum / (double) i; double temp = 0; for(int b = 0; b < i; b++) { int t = tempData[b]; temp += (avg - t) * (avg - t); } stdevTop = sqrt(((double) temp / (double) i)); } if (INCLUDE_WORST_PERCS) { trueForTopFalseForBottom = false; } else { flag = true; break; } } } else { tempData[iBottom] = time; iBottom++; sumBottom += time; if (minBottom == -1) { minBottom = time; } } } } ss << " | " << fixed << setprecision(5) << (perc * 100) << "%"; if (INCLUDE_TOTALS) ss << " (" << i << ")"; ss << " = [avg: " << (sum / i); if (INCLUDE_STDEV) ss << ", stdev: " << fixed << setprecision(2) << stdevTop; ss << ", max: " << max << "]"; if (INCLUDE_WORST_PERCS) { ss << " - " << fixed << setprecision(5) << ((1 - perc) * 100) << "%"; if (INCLUDE_TOTALS) ss << " (" << (iBottom > 0 ? iBottom : 0) << ")"; ss << " = [avg: " << (iBottom > 0 ? (sumBottom / iBottom) : -1); if (INCLUDE_STDEV) { ss << ", stdev: "; if (iBottom <= 0) { ss << "?"; } else { double avgBottom = (sumBottom / iBottom); double temp = 0; for(int b = 0; b < iBottom; b++) { long t = tempData[b]; temp += (avgBottom - t) * (avgBottom - t); } double stdevBottom = sqrt((double) temp / (double) iBottom); ss << fixed << setprecision(2) << stdevBottom; } } ss << ", min: " << (minBottom != -1 ? minBottom : -1) << "]"; if (INCLUDE_RATIOS) ss << " R: " << fixed << setprecision(2) << (iBottom > 0 ? ( ((double) (sumBottom / iBottom) / (double) (sum / i) ) - 1) * 100 : -1) << "%"; } delete[] tempData; } void swapping(int &a, int &b) { //swap the content of a and b int temp; temp = a; a = b; b = temp; } void display(int *array, int size) { for(int i = 0; i array[j+1]) { //when the current item is bigger than next swapping(array[j], array[j+1]); swaps = 1; //set swap flag } } if(!swaps) break; // No swap in this pass, so array is sorted } } void doSomething(int *array, int size) { for(int z = 0; z < size; z++) { array[z] = size - z; } bubbleSort(array, size); } int main(int argc, char* argv[]) { int iterations = stoi(argv[1]); int warmup = stoi(argv[2]); int arraySize = stoi(argv[3]); int proc = stoi(argv[4]); cpu_set_t my_set; CPU_ZERO(&my_set); CPU_SET(proc, &my_set); sched_setaffinity(0, sizeof(cpu_set_t), &my_set); long nanoTimeCost = EXCLUDE_NANO_TS_COST ? calc_nano_ts_cost() : 0; struct timespec ts; long long x = 0; long long totalTime = 0; int minTime = numeric_limits::max(); int maxTime = numeric_limits::min(); map* results = new map(); int * array = (int*) malloc(arraySize * sizeof(int)); for(int i = 0; i < iterations; i++) { long start = get_nano_ts(&ts); doSomething(array, arraySize); long end = get_nano_ts(&ts); for(int j = 0; j < arraySize; j++) { x += array[j]; } int res = end - start - nanoTimeCost; if (res <= 0) res = 1; if (i >= warmup) { totalTime += res; minTime = min(minTime, res); maxTime = max(maxTime, res); auto iter = results->find(res); if (iter != results->end()) { (iter->second)->value = (iter->second)->value + 1; } else { mi* elem = new mi(); elem->value = 1; (*results)[res] = elem; } } } int count = iterations - warmup; double avg = totalTime / count; cout << "Value computed: " << x << endl; display(array, arraySize); cout << "Nano timestamp cost: " << nanoTimeCost << endl; free(array); stringstream ss; ss << "Iterations: " << count << " | Avg Time: " << avg; if (INCLUDE_STDEV) { long temp = 0; long x = 0; for(auto iter = results->begin(); iter != results->end(); iter++) { int time = iter->first; long count = (iter->second)->value; for(int a = 0; a < count; a++) { temp += (avg - time) * (avg - time); x++; } } double stdev = sqrt( temp / x ); ss << " | Stdev: " << fixed << setprecision(2) << stdev; } if (count > 0) { ss << " | Min Time: " << minTime << " | Max Time: " << maxTime; } add_perc(ss, count, 0.75, results); add_perc(ss, count, 0.90, results); add_perc(ss, count, 0.99, results); add_perc(ss, count, 0.999, results); add_perc(ss, count, 0.9999, results); add_perc(ss, count, 0.99999, results); if (MORE_PERCS) { add_perc(ss, count, 0.999999, results); add_perc(ss, count, 0.9999999, results); } cout << ss.str() << endl << endl; delete results; return 0; }