Tag Archives: multithreading

Using Group-Object With Expressions

Some background
I dealt with the task of verifying all dns records in a server with ~50000 A-records. For every forward record, we wanted to check if that IP had a reverse lookup record that was pointing to the same hostname in the reverse zone, and log the results.

This was painfully slow and took hours using dns lookup with the .Net object or nslookup, and puts unnecessary load on the DNS Server. (not much, but still…)

So I decided to go with the option of downloading the zones and comparing them in memory to gain performance.

I won’t go through the process of downloading the dns zones and parsing them now (if you want me to, tell me!), but I just wanted to share something that I learned about the “Group-Object”-cmdlet. Because even when I did this locally in memory, the process of searching through arrays with ~50000 records was very slow, so I thought I could use multithreading to speed things up, and run every zone as a separate job simultaneously.

So how to do this?

Using the Where-Object cmdlet
One way of doing it is to simply find all the IPs in the array that start with a specific address. For example “10.10.”

That would look something like this:

$MyNetwork = $AllMyIPs | Where-Object { $_.IP -like '10.10.*' }

That command takes ~2,5 seconds to execute on my server, and you have to do it for every network you have (or at least for every thread you want to start). You want as small chunks as possible to speed up the search for reverse records later, but you don’t want the penalty of splitting the records up too many times.

So we are getting there… But we could do a lot better!

Using the Group-Object cmdlet With Expressions
The solution was simple, but I never thought of it before this. The Group-Object cmdlet can group things based on expressions!

To group the same array as above ($AllMyIPs, with the columns IP and Hostname), by “B-class networks”, well, two octets, you simply have to write:

$AllMyNetworks = $AllMyIPs | Group-Object { (($_.IP.Split(".")[0,1] -join ".") + ".") }

The above command takes ~3,5 seconds to execute, but now all the IPs will be grouped according to the first two octets. And you can easily loop through them and send them of with the “Start-Job” cmdlet to verify them.

How to access them? You could write something like:

$MyNetwork = $AllMyNetworks | Where-Object Name -like "10.10.*" | select -ExpandProperty Group

Which will take 5-10 milliseconds to execute depending on the network size.

Conclusion
That will save you a lot of time in the end! 🙂 I can now verify the 50000 records in about 15 minutes instead of many hours.

This is very useful for many other applications aswell, and something I’ve started to use a lot when working with huge arrays.

Another example is for grouping e-mailaddresses based on maildomain, it’s as simple as:

$AlotOfMailAddresses | Group-Object { ($_ -split "@")[1] }

I hope someone finds this useful!