dotNoted

Icon

Observations of .Net development in the wild

Development performance hack

Got a laptop? Do development? Mind wander between rebuilds and the app recycle/JIT and page display?
 
Mine too… I haven’t eliminated it, but I’ve found a setting which may keep me from getting 30km down the road of a day dream before IE draws the page.
 
<disclaimer>Warning: this will ruin your life if you follow this advice. Every bad thing that can happen, will. I am not responsible for you trashing your machine because you didn’t read and understand this.</disclaimer>
 
Also, don’t do this if you are running Windows Server, unless you do it in reverse. Look at the TechNet entry for more details.
 
In the registry, under the HKLM hive, open the SystemCurrentControlSetControlPriorityControl key and locate the Win32PrioritySeparation value (should be the only one in that key). Change it to 0x01 hex.
 
The full description of why this works to help your background services be more responsive on your dev box is rather lengthy so I won’t repeat it – merely link to it. Basically, though, it changes the unit of time (quanta) the scheduler gives to process threads depending on whether they have a foreground thread or not. 0x02 is the default and it gives foreground processes a quanta which is 3 times larger than the background process quanta. 0x01 reduces this to 2 times. 0x18 completely changes the scheduling to make it like Windows Server. Only if you are running XP (or 2000), though.
 
This allows IIS and the other server process you are using to build your app with relatively more time to get the work done. It may boost your services responsiveness when debugging things like web apps on the local machine.
 
Or not.
 
Let me know because I’m interested.
 
 

Filed under: Administration

Cyclomatic Complexity and Program Analysis

Cyclomatic complexity is a well-established metric of software engineering that measures, in seemingly straightforward terms, how complex a program is based on the number of branches within the module (function, procedure, subsystem, etc.) and how many modules it references. There are many definitions which essentially borrow from the same understanding – one which apparently has a good grasp on either what it means intuitively or the requisite math. Both appear to be valid, but as usual, a more formal understanding contributes to a more carefully constructed cognitive model of it, and enables us to move from general to specific more easily without logic errors, and thus yields more value. You can get the former by simply saying "the number of paths you can take without duplication through a module" without seeming to confuse anyone – we all know what a "path" and a "module" is, right? Probably not exactly. The latter formality takes a little work. Since there is a dearth of a formal definition of "linearly independent" alongside definitions of cyclomatic complexity I’ve found, let’s see if we can raise the level of dialog, and infuse a bit more rigor into the art of measuring. I won’t give equations – just narrative – unless sufficiently provoked.

 

A program consisting of a set of statements and data can be thought of as a graph – a mathematical construct which relates a "vertex" or "node", which is an element in a set of objects – conceptual or physical, doesn’t matter – to other vertexes or nodes (nodes from now on since I want to avoid the contention between "vertices" and "vertexes"). These relations are "edges". In graphs, edges can be "directed" or "undirected". We’re interested in the directed edges between nodes, since programs have "flow" or the CPU marches forward being pushed by the voltage given it. Now, the set of all edges is, effectively, the program. If we start at the starting node (public void Main()) and follow each directed edge until the end, the program completes (or should, since a program which doesn’t have a terminating state is, in practical terms, an error). Finally, an edge exists between nodes (statements in the program) if they are logically adjacent in the program, due to the progression of the CPU. This includes branching and function calls.

 

Another way to look at the set of all concluding directed edges is a set of vectors. A vector is a mathematical construct which relates certain quantities together in a structure – an ordered set, basically. Commonly, a physical force can be modeled by a vector because it has more than one component (amount, or magnitude, and direction), and thus can’t be measured by a "scalar", or single, value. So, each edge in our program graph is a vector quantity which has a magnitude and direction which gives us the next statement, from any starting statement, in our program space.

 

Now, if we take the set of all the vectors that we can make in our program space, and use only the ones which don’t repeat edges in the graph (or, looking at it another way, execute any 2 statements in succession in the same program state, which really can’t happen in a computer program anyway, unless there is some external influence on the Instruction Pointer register due to debugging or an exploit, since even in a loop, while the same statements are being executed, the loop counter is incrementing or stream is being read and exhausted, so the program state is changing), and the set of vectors we choose covers all the nodes in our program graph, then we have a set of "linearly independent" edges or paths through our program. This is also known as a basis for the program space.

 

Once we have a basis, or set of linearly independent paths through the program, we can then proceed to evaluate the complexity of the now well-determined system. The added formalism could potentially buy us a lot – reducing the problem to a set of linear equations could allow us to develop optimizers both for compile time and run time, or just help us root out our own inefficiencies. Just the term “linearly independent” doesn’t allow for that immediate insight (Math 341 isn’t a requisite for CompSci in most schools, but most developers are comfortable with matrix math), but rather just makes our work appear to only have “truthiness” instead of trustworthiness. So please, if you talk about cyclomatic complexity, please explain linearly independent carefully, or make sure your references do.

Filed under: Software Engineering

2^128 is a really phenomenally large number

I’ve been playing around with WinFS beta 1 and I noticed that the item ids are GUIDs. Now, I’ve been told that it is nearly impossible to come up with 2 GUIDs that are identical – so very nearly that, practically, it is and will be. I believed this since 2^128 seems large. However, I couldn’t help wondering, with all the databases that use GUIDs as primary keys, all the devices, all the transactions, etc., etc., and now WinFS, isn’t it just slightly possible that we could "run out" and start having collisions?
 
In a word: not a chance. I did some quick calculations on the matter to see how many GUIDs could be assigned to each man, woman and child on the planet now and even to cover population estimates over the next 50 years, and the answer turns out to be – another very large, mind-staggeringly proportioned number. Even if that many people started to use GUIDs since the beginning of the universe

Filed under: Software Engineering

.Net memory copy performance

I couldn’t find any articles on the performance of .Net memory copy techniques, so I crafted a very rough test to see for myself. Note: this is a very rough test, if I hadn’t mentioned it. As such there are lots of problems with it. I don’t really care, other than to acknowledge that dealing with them is a hallmark of a well done test. This isn’t a well-done test. It is a hack to convince me that Array.Copy isn’t an evil performance hog. This is what I have to work with and it’s an improvement.

 

I though of 4 different ways to copy memory which don’t use the Marshal class and wrote a quick routine to test the speed of each one. You’ll find the code below. Here are the results of a typical test:

 

Array.Copy took: 0.0200288 seconds

ArrayList took: 3.6051840 seconds

MemoryStream took: 0.6709648 seconds

P/Invoke copy took: 0.0600864 seconds

Array.Copy took: 0.0300432 seconds

ArrayList took: 4.3462496 seconds

MemoryStream took: 0.0300432 seconds

P/Invoke copy took: 0.0300432 seconds

Array.Copy took: 0.0400576 seconds

ArrayList took: 4.9971856 seconds

MemoryStream took: 0.0100144 seconds

P/Invoke copy took: 0.0400576 seconds

Each test is run ten times on a new set of data. Oh, yea, the data is a byte array of 10,000,000 randomly chosen values. It gets recreated on each run so that caching has less of an effect. So what are we seeing here? Well, the Array.Copy method is about as fast as both MemoryStream (which was suprising – I thought wrapping a stream around a data buffer and copying in the data would take a while) and PInvoke. All of them are pretty quick for normal application use. Copying the data into ArrayList took anywhere from 100 to 1000 times longer than any of these other methods. This is due, obviously, to the boxing and unboxing of the byte datatype – a big hit. If you run this and let the test run a couple of times, you’ll also perhaps see some big (~2x) variations on the ArrayList time, since there is possibly some GC activity due to memory pressure – it depends on your system, however (which is one of the problems with this test, I ran it on my regular workstation). Another interesting thing is that the MemoryStream method took an order of magnitude more time on the first run… I’d chalk this up to the great instruction cache on the Intel chip I’m running. I don’t really know, I’m speculating, but then I didn’t really want to answer this question, so I’m good with it for now. If you have an answer, post a comment and I’ll concede ignorance if your explaination is convincing. Finally, the times that MemoryStream and P/Invoke took are surprisingly (to me) short. I already mentioned MemoryStream, but I also thought that the extra baggage in the form of a switch into unmanaged code for P/Invoke would be more significant. As it is, there doesn’t seem to be any significant difference among these methods and Array.Copy (which is another problem – no measurement of variance and significance levels).

I’m now rather convinced that using any of these methods to push bytes around in memory is ok, and I’ll proceed to use which ever lends itself to the problem at hand, which will most likely overwhelmingly be Array.Copy.

As mentioned, here’s the whole test:

using System;

namespace

MemcopyTests

{

/// <summary>

/// Summary description for Class1.

/// </summary>

class Class1

{

/// <summary>

/// The main entry point for the application.

/// </summary>

[STAThread]

static void Main(string[] args)

{

TimeSpan startTime, endTime;

for(int i=0; i<10; i++)

{

byte[] data = createTestData();

startTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

byte[] newData = new byte[data.Length+1];

Array.Copy(data, 0, newData, 0, data.Length);

newData[newData.Length-1] = (byte)1;

endTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.Diagnostics.Debug.WriteLine(String.Format(

"Array.Copy took: {0:N7} seconds", endTime.TotalSeconds – startTime.TotalSeconds));

startTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.Collections.ArrayList dataList =

new System.Collections.ArrayList(data);

dataList.Add((

byte)1);

endTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.Diagnostics.Debug.WriteLine(String.Format(

"ArrayList took: {0:N7} seconds", endTime.TotalSeconds – startTime.TotalSeconds));

startTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.IO.MemoryStream dataStream =

new System.IO.MemoryStream(data.Length);

dataStream.Read(data, 0, data.Length);

dataStream.Capacity += 1;

dataStream.Position = dataStream.Capacity-1;

dataStream.WriteByte((

byte)1);

endTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.Diagnostics.Debug.WriteLine(String.Format(

"MemoryStream took: {0:N7} seconds", endTime.TotalSeconds – startTime.TotalSeconds));

startTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

byte[] newData2 = new byte[data.Length+1];

unsafe

{

fixed(void* pData = data)

fixed(void* pNewData2 = newData2)

MoveMemory(new IntPtr(pNewData2), new IntPtr(pData), data.Length);

}

newData2[newData2.Length-1] = (

byte)1;

endTime = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

System.Diagnostics.Debug.WriteLine(String.Format(

"P/Invoke copy took: {0:N7} seconds", endTime.TotalSeconds – startTime.TotalSeconds));

}

Console.ReadLine();

}

[System.Runtime.InteropServices.DllImport(

"Kernel32.dll", EntryPoint="RtlMoveMemory", SetLastError=false)]

static extern void MoveMemory(IntPtr dest, IntPtr src, int size);

private static byte[] createTestData()

{

byte[] data = new byte[10000000];

Random random = new Random();

random.NextBytes(data);

return data;

}

}

}

Filed under: Code Kaizen