Monday, 29 March 2010

Finding duplicates in Excel.

I've been asked how to do this several times now, so it seems like it isn't obvious to all, but is easy.

Step 1, sort the data on the column you want to find the duplicates in.
Step 2, create a new column. In the first cell of the new column (assuming row 2, column B is the first cell of the column containing the possible duplicates), enter =B2=B3
Step 3, copy the new cell down into the other rows. The cell will evaluate to TRUE if a cell is the same as the next cell down.
Step 4, put an auto filter on the data and set it to just display TRUE entries for the new column
Step 5, you are now looking at just the duplicate data.

Friday, 2 October 2009

Another Ruby and Excel Tip

Producing a list of all the items that are in one column and not another is a bit of a bear in Excel. You'd either have to use vlookup and sort for N/A or write a Macro. Thanks to Ruby's excellent Array handling facilities it is really easy in Ruby. Most of the script below is concerned with getting the data into Ruby for processing.

require "win32ole"
excel = WIN32OLE.connect("Excel.Application")

# change the row and column numbers of source and destinations to suit your Excel Spreadsheet.
# Could bring these in as parameters if preferred
ws = excel.worksheets("Sheet1")

first_col = excel_column_to_a(2, 1, ws)
second_col = excel_column_to_a(2, 2, ws)
a_to_excel_column(first_col - second_col, 2, 3, ws)
a_to_excel_column(second_col - first_col, 3, 3, ws)


def a_to_excel_column(arr, row_number, col_number, sheet)
  arr.each do |cur_val|
    sheet.cells(row_number, col_number).value = cur_val
    row_number += 1
  end
end

def excel_column_to_a(row_number, col_number, sheet)
  a = []
  # Will stop when .value of cell is nil
  while cur_val = sheet.cells(row_number, col_number).value
    a << cur_val     row_number += 1   end   a end


Monday, 21 September 2009

Designing a Ruby Based Identity Management System

This is an itch I've wanted to scratch for a long time. I've worked with a variety of technologies that all to a greater or lesser extent make use of user identities. In fact, keeping track of all these identities is a major pain for most large companies and there are several identity management products out there.

These products generally have one thing in common. They are expensive. Very expensive. Novell Identity Manager is $25/person. Sun Identity Manager Suite is $75/person. If your customer has 10,000 workers, that is a lot of money, even before you the time to configure the product is factored in.

I think that it is possible to build a product in Ruby on Rails that has enough functionality for most requirements and do it in substantially less than 250 days (at $1,000 per day)

The generic, basic pattern of an identity manager solution is shown here.

Essentially there is a large database of people. The reason why it has it's own database in a modern outsourced company, people who have accounts on a company's computers could be employees of several different organizations. The goal is to get anybody who is allowed access to the organization's resources into that database. A major is to automate as much of this as possible.

The second point of the database is to ensure that the appropriate level of authorization, or sanction is supplied before user accounts are created on the target systems. Behind the scenes our people database needs to know which people can authorize which other people to have what type of accounts on what systems.

Another point about the people database is that it can be used to collect information from a number of different data sources. SAP HR may be the authority for a person's name and staff number, but the telephone system sets the user's phone number.

With this in mind, it is time to refine our diagram slightly. Systems, according to a set of rules may be either sources of data or sinks of of data.



Now the data sources are peers of the target systems and rules define the fact that they are the sources of data. We also have tables that contain replicas of the target data on each system. This is to allow us to query the information easily. We'll action user requests like "create a new account" immediately and also conduct batch syncs each night.

I'm going to cover this all in much more detail in later blog posts. I'll also put up some sample code on Git so you can play around and customize the code.

Friday, 4 September 2009

Using Ruby with LDAP

Well, I'm now looking for work again. I really would like to work with Ruby on Rails, so if you are hiring, please let me know..

Now that is out the way, I wanted to write about LDAP. LDAP stands for Lightweight Directory Access Protocol. Although the most common use of LDAP and directories is for authentication, it is also somewhat like a database in that data in a directory can be manipulated. However there are major differences compared to SQL. The first and most major is that the data is located in a tree structure. The second is that some objects can have multiple attributes of the same type, meaning that when you look at the object you may get an array, nil, or a single value.

Another thing that is different is the query language. A native LDAP query would look something like this:

(&(|(givenName=Joe)(givenName=Joseph))(sn=Blogg*))

This means find all the objects where the Surname begins with Blogg and the givenName is Joe or Joseph. The query is also scoped to the current object, one level of the tree or all the subtrees. Luckily for most queries you can use ActiveLDAP and this generates the LDAP query in much the same way that ActiveRecord generates a SQL query when a find command is executed.

There are several "official" routes that Ruby on Rails applications can access LDAP and an unofficial cheeky one that can only be used on a Windows host.

Ruby/LDAP

This is absolutely the best foundation LDAP set. It works well, is fast, but can be a pain to compile under windows. See this forum post for details. It does work, I've tried it.

Net::LDAP

This is a pure Ruby LDAP framework. Unfortunately the last release 0.0.4 was in August 2006 and there have been some bug fixes since in the edge version that turn it from almost unusable to just a bit buggy. On the plus side, on Windows, you don't have muck around with compilers.

ActiveLDAP

This is the rather cool front end for the preceding two LDAP libraries maintained by Kouhei Sutou. It adds an ActiveRecord like veneer to the LDAP classes, which is an amazing achievement given how different SQL and LDAP are.

The nice thing about ActiveLDAP is that it can be used as a drop in replacement for AciveRecord in one or more models of your rails solution. It also allows a degree of one to many and many to one associations.


Use IronRuby with the Microsoft Tools

I've never done this, but I'll lay money that it can be made to work.


Use the Microsoft OLE LDAP Interface (Windows only)

On the Windows platform, Ruby has access to the Microsoft OLE Interface to LDAP. Now the search side of this isn't anywhere near as easy to use as ActiveRecord and to be honest I would not bother with it for that reason, however it does make sense to use the record manipulation side of things if you are using Net::LDAP and you run up against one of the many bugs/omissions there.

For example, if you want to move a user, you cannot do this with ActiveLDAP running on Net::LDAP as Net::LDAP doesn't implement the LDAP function that does this.

my_user = User.find(:first,:attribute=>"cn",:value=>"Joe Bloggs")
obj_ou = WIN32OLE.connect("LDAP://#{User.host}/OU=NewOU,DC=Org,DC=com")
my_moved_user = obj_ou.moveHere( "LDAP://#{my_user.dn}" , "CN=#{my_user.cn}" )

The #{User.host} in the second line forces the LDAP to bind to the same server that ActiveLDAP is using. Otherwise the two servers could well be looking at different replicas of the data and this is a bad thing.

Summary

I hope that the post above has provided a useful pointer. I'll be covering some LDAP specifics in greater detail in subsequent posts. LDAP is a fascinating sub branch of Ruby. While I'm looking for work, I've started work on an open source Identity Lifecycle Management Engine that will be able to populate accounts on target systems and will accept data from upstream HR systems. Hopefully it will stop companies having to shell out megabucks for proprietry IDM solutions, but it will be while before any code is ready for publications.

Wednesday, 12 August 2009

Ruby on Windows

I'm a contractor Windows/Novell/Linux system administrator. Mainly Windows, nowadays.

Whenever I start a new contract (hint - I'm looking now...) I like to install Ruby on my Windows workstation. Almost invariably I'll be asked to produce an Excel sheet with some information gathered from Windows. For me, the easiest way is simply to fire up a Ruby console, get it talking to Excel and type the commands in manually to prepare the report. For more complicated work, I often just prepare the commands in notepad and paste them into the Ruby interpreter.

Here's how easy it is to extract some user information from a system and put it into Excel.

require 'win32ole'
excel = WIN32OLE.connect("Excel.Application")
worksheet = excel.worksheets("Sheet1")
row = 2
server = WIN32OLE.connect("WinNT://replaceWithYourServerName")
server.filter = [ 'user' ]
server.each do |u|
   wb.cells(row,1).value = u.name
   wb.cells(row,2).value = u.fullName
   wb.cells(row,3).value = u.description
   row += 1
end


As soon as you type end and press return, your excel worksheet populates with the user information from the system. You would need an authenticated connection with admin rights (being logged into the same domain as the server with admin rights is enough).

Hope this was useful to somebody. I haven't seen many posts about using Ruby on Windows and talking to the Windows APIs.