Sunday, January 27, 2008

RubyGarden Archives: Scripting Excel

Editor's Note: Once upon a time, there was a website named RubyGarden.org, which contained many helpful links and articles. The website has recently dropped off the face of the earth. The following "Scripting Excel" article was salvaged from the Google cache and is provided here in its entirety.



Although the Pickaxe book has some examples of scripting Excel, I still had quite a lot of digging to do for some of the scripts I needed. I got some by looking at the M$ docs (poor as they are**), and some by looking at Perl examples. So this page is for collecting examples of code that others might adapt (and add to).

See also: ScriptingOutlook, ScriptingAccess

o You can download the Office XP Excel Object Reference help file as part of an expanded help bundle here [1]. It's large [19MB], but it includes a bunch of VBA help files that I couldn't track down otherwise. The file you need for Excel items is VBAXL10.CHM. It appears to have complete docs on all the Excel objects. -- ChrisMorris

o A great resource for general Excel and programming related questions (well with VBA) are Chip Pearsons [Excel Pages]. I learned a lot from his article [Cell References In A Range] for example. -- BernhardLeicher

First of all:

require 'win32ole'

Opening spreadsheets, accessing workbooks and worksheets

excel = WIN32OLE::new('excel.Application')
workbook = excel.Workbooks.Open('c:\examples\spreadsheet.xls')
worksheet = workbook.Worksheets(1) #get hold of the first worksheet
worksheet.Select #bring it to the front -need sometimes to run macros,
# not for working with a worksheet from ruby
excel['Visible'] = true #make visible, set to false to make invisible
# again. Don't need it to be visible for script to work

reading data from spreadsheet

worksheet.Range('a12')['Value'] #get value of single cell
data = worksheet.Range('a1:c12')['Value'] #read into 2D array

finding the first empty row (using empty column A)

line = '1'
while worksheet.Range("a#{line}")['Value']
line.succ!
end #line now holds row number of first empty row

or to read as you go

line = '1'
data = []
while worksheet.Range("a#{line}")['Value']
data << worksheet.Range("a#{line}:d#{line}")['Value']
line.succ!
end

writing data into spreadsheet, example

worksheet.Range('e2')['Value'] = Time.now.strftime
'%d/%m/%Y' #single value
worksheet.Range('a5:c5')['Value'] = ['Test', '25', 'result']

loading all Excel constants into a class

class ExcelConst
end
WIN32OLE.const_load(excel, ExcelConst)

Now the constant xlDown is accessible as

ExcelConst::XlDown

To find out what constants to use you can use this script. You run it by passing in a string which is matched against the constant names.

require 'win32ole'

module ExcelConsts
end

excel = WIN32OLE.new("Excel.Application")
WIN32OLE.const_load(excel, ExcelConsts)
excel.quit()

puts 'Matches for: ' + ARGV[0]
ExcelConsts.constants.each {|const|
match = const.match(/#{ARGV[0]}/)
value = eval("ExcelConsts::#{const}")
puts ' '*4 + const + ' => ' + value.to_s unless match.nil?
}

An example would be looking for the constant to center text. I ran

ruby search_excel_consts.rb Center

and the following results came up:

XlCenterAcrossSelection => 7
XlVAlignCenter => -4108
XlCenter => -4108
XlLabelPositionCenter => -4108
XlPhoneticAlignCenter => 2
XlHAlignCetner => -4108
XlHAlignCenterAcrossSelection => 7

calling macros

excel.Run('SortByNumber')

Setting background colour

worksheet.Range('a3:f5').Interior['ColorIndex'] = 36 #pale yellow
# Set background color back to uncoloured (rnicz)
worksheet.Range('a3:f5').Interior['ColorIndex'] = -4142 # XlColorIndexNone constant
# or use Excel constant to set background color back to uncoloured
worksheet.Range('a3:f5').Interior['ColorIndex'] = ExcelConst::XlColorIndexNone

Adding Formulae

emptyRow = 15
worksheet.Range("t#{emptyRow}")['Formula'] = "=(Q#{emptyRow}+L#{emptyRow}+I#{emptyRow}+S#{emptyRow})"

saving changes

workbook.Close(1)
# or
workbook.SaveAs 'myfile.xls'
# default path is the system defined My Documents folder

ending session

excel.Quit

If you're experimenting from within irb and are having problems with processes hanging around after you've called excel.Quit - try deleting the reference to excel and invoking the garbage collector.

excel.Quit
excel = nil
GC.start

Hopefully this is of some use. Please add anything else you have discovered.

Some further stuff that I learned so far...
It partly overlaps with what ChrisMorris already wrote, maybe we can merge it later on. -- BernhardLeicher

Start Excel, create new workbook and save it:

require 'win32ole'
excel = WIN32OLE.new("excel.application")
excel.visible = true # in case you want to see what happens
workbook = excel.workbooks.add
workbook.saveas('c:\examples\spreadsheet1.xls')
workbook.close

Or, suppose Excel is already started and a few Excel files are opened (=workbooks in Excel jargon): Connect to the running instance of Excel, activate one of the workbooks and write something.

This also shows that Excel collections can be iterated very handy with "each", and that collections can sometimes be indexed by number or by name:

excel = WIN32OLE.connect("excel.application")
excel.workbooks.each{|wb|puts wb.name} # loop through workbooks and display names
excel.workbooks(1).activate # activate by number
excel.workbooks("Mappe1").activate # or by name
excel.range("b5").value="soso" # write something to cell B5

Connecting to Excel is particularly good fun when done interactively from irb: You instantly see what happens!

irb(main):001:0> require 'win32ole'
true
irb(main):002:0> excel=WIN32OLE.connect('excel.application')
#
irb(main):003:0> excel.workbooks.each{|wb|puts wb.name}
PERSONL.XLS
Mappe1
Mappe2
Mappe3
nil
irb(main):004:0> excel.workbooks(1).name
"PERSONL.XLS"
irb(main):005:0> excel.workbooks("Mappe1").activate
true
irb(main):006:0> excel.range("b5").value="soso"
nil

Excel => workbook => worksheet => range(cell)

What always bugged me when browsing through examples were the various ways of referring to a particular cell on a particular worksheet in a particular workbook. When Excel is started and "Mappe1" is the currently active workbook and "Tabelle1" is the currently active worksheet, all following statements do the same thing:

excel.workbooks("Mappe1").worksheets("Tabelle1").range("a1").value
excel.worksheets("Tabelle1").range("a1").value
excel.activeworkbook.activesheet.range("a1").value
excel.activesheet.range("a1").value
excel.range("a1").value

My confusion was probably caused by the fact that a lot of properties/methods can be called on "excel" directly and then default to the currently active workbook/worksheet. It's more a matter of taste to specify "activesheet" and "activeworkbook" or not.

And regarding the hierarchy, it seems to be as simple as: When Excel is up and running, it contains 0 (no file opened) or more workbooks (Workbook? Just Excel jargon for an Excel file!), with each workbook/file containing 1 or more worksheets.

Various methods for addressing a cell or range of cells

In the Excel object model there isn't something like a cell object, it's all covered by the "Range" object. A range can represent only one cell or a whole bunch of them (a column, a row, a rectangular block of cells, ....).

Let's assume for the following examples, that "sheet" contains a reference to an Excel worksheet, obtained e.g. by:

require 'win32ole'
excel = WIN32OLE.connect('excel.application') # connect to running instance of Excel
sheet = excel.activesheet

sheet.range(cellname/cell[, cellname/cell])

A range can be obtained by using the worksheet's Range property. A range with only one cell:

sheet.range("a1")

Or a rectangular block of cells (A1 to C3):

sheet.range("a1", "c3")

Same with one argument:

sheet.range("a1:c3")

Whole Column A:

sheet.range("a:a")

Whole Row 3:

sheet.range("3:3")

A range itself has a range property, thus allowing to write:

sheet.range("c3").range("a1").address # >> "$C$3"

Doesn't make much sense, one might note that the second range's address becomes relative to the first range.

sheet.cells(rowindex, columnindex)

This is all wonderful, but shouldn't there be a way of addressing cells by row and column number? The worksheet's Cells property does that: It gets you a range with one cell by specifying the row and column number. The indices are counted from 1, so Cells(1,1) gives you cell A1:

sheet.cells(3,1).address # >> "$C$3"

Combined with range to get range A1:C3:

sheet.range(sheet.cells(1,1), sheet.cells(3,3))

And when applied to another range, row and column index become relative to the first range:

sheet.range("b2").cells(2,2).address # >> "$C$3"

The index can be negative:

sheet.range("c3").cells(-1,-1).address # >> "$A$1"
range.offset(rowoffset, columnoffset)

If you have a range, this can be used to return another range that is y rows and x columns away (or offset) from this one. This time the offsets count from 0:

sheet.range("a1").offset(0,0).address # >> "$A$1"
sheet.range("b5").offset(1,1).address # >> "$C$6"

While offset somehow reminds of the cells function, this might make a difference: If range contains a block of cells, an offset block of cells is returned too:

sheet.range("a1:c3").offset(0,2).address # >> "$C$1:$E$3"

Negative offsets can be specified too:

sheet.range("b2").offset(-1,-1).address # >> "$A$1"

Getting cell values

There isn't only one method for obtaining a cell's value, but at least three of them: text, value, value2. So which should one use, what is the difference between them?

Sidenote:
An Excel cell's content is somewhat relative, what you see isn't necessarily what is actually inside the cell, because a cell's content is displayed according to a specified format. A cell might contain "0.12345", but is displayed as "0.12" or "0.12 DM" or whatever. It might be good to know, that internally a cell's content is either a text or a floating point number. That's it, nothing else.

Just for curiosity:
Dates are represented internally as floating point values too (more details at Chip Pearson's site: HTTP://www.cpearson.com/excel/datetime.htm):³59³ "Excel stores dates and times as a number representing the number of days since 1900-Jan-0, plus a fractional portion of a 24 hour day: ddddd.tttttt . This is called a serial date, or serial date-time."
So if the content is 37936.0 and its format is "date", it's displayed as "11.11.03" or "Nov 2003".

For the following examples let's assume a content of:

A1 => 10.12345 => formatted with 2 decimal digits => 10.12
B1 => 10.12345 => formatted as currency => 10,00 DM
C1 => 11.11.03 => date => 11.11.03

range.text

Text property retrieves the value as it is displayed, as a string. It's readonly, so only for getting values. Because my country settings are "German", floats are displayed with a comma.

sheet.range("a1").text # >> "10,12"
sheet.range("b1").text # >> "10,12 DM"
sheet.range("c1").text # >> "11.11.03"

range.value

This is the result when retrieved with value:

sheet.range("a1").value # >> 10.12345
sheet.range("b1").value # >> "10,1235"
sheet.range("c1").value # >> "2003/11/11 00:00:00"

Retrieves the "internal" value (A1 becomes a float), whereby currency and date are still returned as strings albeit somewhat different than before.

range.value2

According to the Excel documentation value2 behaves just like value, but additionally retrieves currency and dates as doubles (the actual internal content):

sheet.range("a1").value2 # >> 10.12345
sheet.range("b1").value2 # >> 10.12345
sheet.range("c1").value2 # >> 37936.0

Yes, seems to work as advertised.

Setting values

Seems that only "value" is useful here. An integer or float arrives as expected as number in Excel:

sheet.range("a1").value = 1.2345
sheet.range("a1").value = 2

For strings Excel does the same processing that it does when something is interactively entered. It thus depends on how Excel interprets the string:

sheet.range("a1").value = "10.11.2003" # becomes a date
sheet.range("a1").value = "1,2345" # becomes a number (at least with German country settings)

Iterating ranges...

...with each

Ranges can be iterated with each:

sheet.range("a1:a10").each{|cell|puts cell.value}

If the range is a block of cells the iteration goes from left to right, then down one line, and so on:

sheet.range("a1:b5").each{|cell|puts cell.value}

Iterating block of cells by row and output the first cell of each row:

sheet.range("b3:c7").rows.each{|r|puts r.cells(1,1).value}

and by column:

sheet.range("b3:c7").columns.each{|c|puts c.cells(1,1).value}

Result of row.value is an array within an array:

sheet.range("b3:c7").rows.each{|r|puts r.value.inspect} # >> [1.0, 10.0]?

...with activecell

Like moving around an Excel sheet with the cursor. Moving down one cell:

sheet.activecell.offset(1,0).activate

Walking down from the active cell until an empty cell is encountered:

sheet.activecell.offset(1,0).activate while excel.activecell.value

...with an index

range = sheet.range("b3:c7")
noofrows = range.rows.count
(1..noofrows).each{|i|puts range.cells(i,1).value}

Named Ranges

Named ranges make Excel spreadsheets more usable for the end user. To create a named range "myRange":

sheet.names.Add( { 'Name' => 'myRange', 'RefersTo' => 'A2:A216' } )

One problem! This doesn't work. Use a Range object for RefersTo?, not a String:

myRange = sheet.Range( 'A2:A216' )
sheet.names.Add( { 'Name' => 'myRange', 'RefersTo' => myRange } )

How do you use named ranges in ruby? Named ranges are kept in the worksheet as well as the workbook. You may need to check both locations.

Something like the following works for named ranges manually defined by the user:

rangeString = workbook.names( 'Sheet1!myRange' ).Value

1. Remove "=" prefix (e.g. "=Sheet1!$A$2:$A$4")

rangeString = rangeString.slice( 1, rangeString.length - 1 ) if ( rangeString =~ /^=/ ) worksheet.range( rangeString ).value = 'testing...'

Finding Data Regions...

Data don't always start in row 1, column A, the number of columns might not be fixed, and the number of rows is most often variable. There are two handy methods that can help to find that "block of data".

Let's assume, that B3:C7 contain data.

...with CurrentRegion

Given any cell inside the "data block", CurrentRegion?() finds the surrounding contiguous data region and returns its range:

sheet.range("b5").currentregion.address # >> "$B$3:$C$7"

...by "jumping around"

There's a shortcut key in Excel "+", that allows you to jump to the end/start of regions with content. With our example and the cursor in B3, pressing + would jump the cursor to cell B7, pressing that shortcut again would move the cursor to the last line of the spreadsheet to B65536. There's an equivalent method: "End()".

Finding the last row with data:

sheet.range("b3").end(-4121).address # >> "$B$7"

The parameter indicates the direction, the Excel constants are:

xlDown = -4121
xlToLeft = -4159
xlToRight = -4161
xlUp = -4162

Saving to CSV (or other formats)

Note that Excel can show quite a lot of warnings / confirm request. To supress these:

excel.DisplayAlerts = false

Then:

workbook.SaveAs 'myfile.csv', xlCSV

where xlCSV = 6.

Here are some common file formats:

xlCSV=6
xlCSVMac=22
xlCSVMSDOS=24
xlCSVWindows=23
xlCurrentPlatformText=-4158
xlExcel9795=43
xlTextMSDOS=21
xlTextPrinter=36
xlTextWindows=20

See also: ScriptingOutlook, ScriptingAccess

- How about OpenOffice and Ruby scripting? Anything in this area? The examples all seem to rely on

require 'win32ole'

and I believe this will only work on Windows OSes.


2 comments:

Anonymous said...

Hi David - Thanks SO much for the Ruby on Windows blog. I used your helpful articles as the base for a script that helps automate a really boring task at my girlfriend's job.

She has to go through a spreadsheet to validate tons of URLS and I used your recommendations of wxRuby and win32ole to automate it for her. I wrote it up on my blog as a short story and the post will go live in a few hours.

Johan Steunenberg said...

Hi David,

thanks a lot for this article. It helped me a lot.

I have to add one thing: using Ruby 1.9, a colleague had difficulties with the expression excel['Visible']
Replacing this with excel.Visible solved the problem. In the same way, we had to replace the worksheet.Range('a12')['Value'] with worksheet.Range('a12').Value

Hope this helps someone