Friday, June 29, 2007

Using Ruby & ADO to Work with Excel Worksheets

In an earlier article, I discussed using ActiveX Data Objects (ADO) to access a Microsoft Access database. A reader commented that ADO can also be used to access data in an Excel worksheet. Here's a brief demonstration...

As usual, we'll use the win32ole library:


require 'win32ole'

Create a new ADODB.Connection object:

connection = WIN32OLE.new('ADODB.Connection')

To open a connection to your Excel workbook, we'll call the Connection object's Open method and pass it a connection string. You can use same the Microsoft Jet driver used for accessing an MS Access database, but we need to append an "Extended Property" to specify that this is an Excel woorkbook:

conn_string = 'Provider=Microsoft.Jet.OLEDB.4.0;'
conn_string << 'Data Source=c:\my_folder\my_workbook.xls;'
conn_string << 'Extended Properties=Excel 8.0;'
connection.Open(conn_string)

Now, we'll create an ADO recordset object:

recordset = WIN32OLE.new('ADODB.Recordset')

When calling the RecordSet object's Open method, pass it your SQL statement and the open connection object. When working with an Excel worksheet as your table, append '$' to the worksheet table name and wrap it in brackets:

recordset.Open("select * from [Sheet1$];", connection)

The Recordset object's GetRows method returns an array of columns (not rows, as you might expect), so we'll use the Ruby array's transpose method to convert it to an array of rows:

data = recordset.GetRows.transpose

Close the Connection object by calling its Close method:

connection.close

There you have it! My thanks to reader Khaoz for the suggestion of using ADO with Excel.

Other articles about working with ADO can be found under the ado label to the right.

As always, feel free to post a comment here or email me with questions, comments, or suggestions.

Thanks for stopping by!


Digg my article

6 comments:

Anonymous said...

If your Sheet has no column headers yo have to use "HDR=No" in the Extended Properties - otherwise the first row will be interpreted as header and not imported.

Don't forget the quotation marks:

'Extended Properties="Excel 8.0;HDR=No"'

Eric.

Siva Phaneendra krapa said...

Hi David,

I am getting null values while reading excel cells using ADODB, if excel table column has different sets of data.(Ex: Numeric in one cell, string in another cell). How to handle this situation? I tried with IMEX=1 property, but no fruits.

Thanks in advance
Siva.

Sdevikar said...

Hi David,
Thanks a lot for sharing this with us. I have been trying to automate a few things that involves querying the excel sheet, just as your article demos. However, I observed that, in some cases, specially in number comparisons, the query won't work because it expects that column to be formatted as a number. But when I format that as a number, the query finds nil values in those cells. It occasionally works fine. Is there a work around for this?

David Mullet said...

@Swapnil-

If you would like to send me samples of your code and data, I could review and offer suggestions.

-David (david dot mullet at gmail dot com)

Unknown said...

Hi David!
I'm a big fan of your site!
I'm trying to automate some test case reporting from multiple Excel 2007 spreadsheets, but they're saved in .xlsx format. When I try to open the connection with the example above, I get an error message saying "External table is not in the expected format". Do I need to change the Extended Properties=Excel 8.0 setting?

Thanks in advance for your help!

-Tiffany

David Mullet said...

@Tiffany:

Thanks for the kind words!

For Excel 2007+ XLSX files, you'll need to make two changes to the connection string.

Change...

"Provider=Microsoft.Jet.OLEDB.4.0"

to

"Provider=Microsoft.ACE.OLEDB.12.0"

Change

"Excel 8.0"

to

"Excel 12.0 Xml"

-David