Sunday, January 27, 2008

RubyGarden Archives: Scripting Excel

Editor's Note: Once upon a time, there was a website named RubyGarden.org, which contained many helpful links and articles. The website has recently dropped off the face of the earth. The following "Scripting Excel" article was salvaged from the Google cache and is provided here in its entirety.



Although the Pickaxe book has some examples of scripting Excel, I still had quite a lot of digging to do for some of the scripts I needed. I got some by looking at the M$ docs (poor as they are**), and some by looking at Perl examples. So this page is for collecting examples of code that others might adapt (and add to).

See also: ScriptingOutlook, ScriptingAccess

o You can download the Office XP Excel Object Reference help file as part of an expanded help bundle here [1]. It's large [19MB], but it includes a bunch of VBA help files that I couldn't track down otherwise. The file you need for Excel items is VBAXL10.CHM. It appears to have complete docs on all the Excel objects. -- ChrisMorris

o A great resource for general Excel and programming related questions (well with VBA) are Chip Pearsons [Excel Pages]. I learned a lot from his article [Cell References In A Range] for example. -- BernhardLeicher

First of all:

require 'win32ole'

Opening spreadsheets, accessing workbooks and worksheets

excel = WIN32OLE::new('excel.Application')
workbook = excel.Workbooks.Open('c:\examples\spreadsheet.xls')
worksheet = workbook.Worksheets(1) #get hold of the first worksheet
worksheet.Select #bring it to the front -need sometimes to run macros,
# not for working with a worksheet from ruby
excel['Visible'] = true #make visible, set to false to make invisible
# again. Don't need it to be visible for script to work

reading data from spreadsheet

worksheet.Range('a12')['Value'] #get value of single cell
data = worksheet.Range('a1:c12')['Value'] #read into 2D array

finding the first empty row (using empty column A)

line = '1'
while worksheet.Range("a#{line}")['Value']
line.succ!
end #line now holds row number of first empty row

or to read as you go

line = '1'
data = []
while worksheet.Range("a#{line}")['Value']
data << worksheet.Range("a#{line}:d#{line}")['Value']
line.succ!
end

writing data into spreadsheet, example

worksheet.Range('e2')['Value'] = Time.now.strftime
'%d/%m/%Y' #single value
worksheet.Range('a5:c5')['Value'] = ['Test', '25', 'result']

loading all Excel constants into a class

class ExcelConst
end
WIN32OLE.const_load(excel, ExcelConst)

Now the constant xlDown is accessible as

ExcelConst::XlDown

To find out what constants to use you can use this script. You run it by passing in a string which is matched against the constant names.

require 'win32ole'

module ExcelConsts
end

excel = WIN32OLE.new("Excel.Application")
WIN32OLE.const_load(excel, ExcelConsts)
excel.quit()

puts 'Matches for: ' + ARGV[0]
ExcelConsts.constants.each {|const|
match = const.match(/#{ARGV[0]}/)
value = eval("ExcelConsts::#{const}")
puts ' '*4 + const + ' => ' + value.to_s unless match.nil?
}

An example would be looking for the constant to center text. I ran

ruby search_excel_consts.rb Center

and the following results came up:

XlCenterAcrossSelection => 7
XlVAlignCenter => -4108
XlCenter => -4108
XlLabelPositionCenter => -4108
XlPhoneticAlignCenter => 2
XlHAlignCetner => -4108
XlHAlignCenterAcrossSelection => 7

calling macros

excel.Run('SortByNumber')

Setting background colour

worksheet.Range('a3:f5').Interior['ColorIndex'] = 36 #pale yellow
# Set background color back to uncoloured (rnicz)
worksheet.Range('a3:f5').Interior['ColorIndex'] = -4142 # XlColorIndexNone constant
# or use Excel constant to set background color back to uncoloured
worksheet.Range('a3:f5').Interior['ColorIndex'] = ExcelConst::XlColorIndexNone

Adding Formulae

emptyRow = 15
worksheet.Range("t#{emptyRow}")['Formula'] = "=(Q#{emptyRow}+L#{emptyRow}+I#{emptyRow}+S#{emptyRow})"

saving changes

workbook.Close(1)
# or
workbook.SaveAs 'myfile.xls'
# default path is the system defined My Documents folder

ending session

excel.Quit

If you're experimenting from within irb and are having problems with processes hanging around after you've called excel.Quit - try deleting the reference to excel and invoking the garbage collector.

excel.Quit
excel = nil
GC.start

Hopefully this is of some use. Please add anything else you have discovered.

Some further stuff that I learned so far...
It partly overlaps with what ChrisMorris already wrote, maybe we can merge it later on. -- BernhardLeicher

Start Excel, create new workbook and save it:

require 'win32ole'
excel = WIN32OLE.new("excel.application")
excel.visible = true # in case you want to see what happens
workbook = excel.workbooks.add
workbook.saveas('c:\examples\spreadsheet1.xls')
workbook.close

Or, suppose Excel is already started and a few Excel files are opened (=workbooks in Excel jargon): Connect to the running instance of Excel, activate one of the workbooks and write something.

This also shows that Excel collections can be iterated very handy with "each", and that collections can sometimes be indexed by number or by name:

excel = WIN32OLE.connect("excel.application")
excel.workbooks.each{|wb|puts wb.name} # loop through workbooks and display names
excel.workbooks(1).activate # activate by number
excel.workbooks("Mappe1").activate # or by name
excel.range("b5").value="soso" # write something to cell B5

Connecting to Excel is particularly good fun when done interactively from irb: You instantly see what happens!

irb(main):001:0> require 'win32ole'
true
irb(main):002:0> excel=WIN32OLE.connect('excel.application')
#
irb(main):003:0> excel.workbooks.each{|wb|puts wb.name}
PERSONL.XLS
Mappe1
Mappe2
Mappe3
nil
irb(main):004:0> excel.workbooks(1).name
"PERSONL.XLS"
irb(main):005:0> excel.workbooks("Mappe1").activate
true
irb(main):006:0> excel.range("b5").value="soso"
nil

Excel => workbook => worksheet => range(cell)

What always bugged me when browsing through examples were the various ways of referring to a particular cell on a particular worksheet in a particular workbook. When Excel is started and "Mappe1" is the currently active workbook and "Tabelle1" is the currently active worksheet, all following statements do the same thing:

excel.workbooks("Mappe1").worksheets("Tabelle1").range("a1").value
excel.worksheets("Tabelle1").range("a1").value
excel.activeworkbook.activesheet.range("a1").value
excel.activesheet.range("a1").value
excel.range("a1").value

My confusion was probably caused by the fact that a lot of properties/methods can be called on "excel" directly and then default to the currently active workbook/worksheet. It's more a matter of taste to specify "activesheet" and "activeworkbook" or not.

And regarding the hierarchy, it seems to be as simple as: When Excel is up and running, it contains 0 (no file opened) or more workbooks (Workbook? Just Excel jargon for an Excel file!), with each workbook/file containing 1 or more worksheets.

Various methods for addressing a cell or range of cells

In the Excel object model there isn't something like a cell object, it's all covered by the "Range" object. A range can represent only one cell or a whole bunch of them (a column, a row, a rectangular block of cells, ....).

Let's assume for the following examples, that "sheet" contains a reference to an Excel worksheet, obtained e.g. by:

require 'win32ole'
excel = WIN32OLE.connect('excel.application') # connect to running instance of Excel
sheet = excel.activesheet

sheet.range(cellname/cell[, cellname/cell])

A range can be obtained by using the worksheet's Range property. A range with only one cell:

sheet.range("a1")

Or a rectangular block of cells (A1 to C3):

sheet.range("a1", "c3")

Same with one argument:

sheet.range("a1:c3")

Whole Column A:

sheet.range("a:a")

Whole Row 3:

sheet.range("3:3")

A range itself has a range property, thus allowing to write:

sheet.range("c3").range("a1").address # >> "$C$3"

Doesn't make much sense, one might note that the second range's address becomes relative to the first range.

sheet.cells(rowindex, columnindex)

This is all wonderful, but shouldn't there be a way of addressing cells by row and column number? The worksheet's Cells property does that: It gets you a range with one cell by specifying the row and column number. The indices are counted from 1, so Cells(1,1) gives you cell A1:

sheet.cells(3,1).address # >> "$C$3"

Combined with range to get range A1:C3:

sheet.range(sheet.cells(1,1), sheet.cells(3,3))

And when applied to another range, row and column index become relative to the first range:

sheet.range("b2").cells(2,2).address # >> "$C$3"

The index can be negative:

sheet.range("c3").cells(-1,-1).address # >> "$A$1"
range.offset(rowoffset, columnoffset)

If you have a range, this can be used to return another range that is y rows and x columns away (or offset) from this one. This time the offsets count from 0:

sheet.range("a1").offset(0,0).address # >> "$A$1"
sheet.range("b5").offset(1,1).address # >> "$C$6"

While offset somehow reminds of the cells function, this might make a difference: If range contains a block of cells, an offset block of cells is returned too:

sheet.range("a1:c3").offset(0,2).address # >> "$C$1:$E$3"

Negative offsets can be specified too:

sheet.range("b2").offset(-1,-1).address # >> "$A$1"

Getting cell values

There isn't only one method for obtaining a cell's value, but at least three of them: text, value, value2. So which should one use, what is the difference between them?

Sidenote:
An Excel cell's content is somewhat relative, what you see isn't necessarily what is actually inside the cell, because a cell's content is displayed according to a specified format. A cell might contain "0.12345", but is displayed as "0.12" or "0.12 DM" or whatever. It might be good to know, that internally a cell's content is either a text or a floating point number. That's it, nothing else.

Just for curiosity:
Dates are represented internally as floating point values too (more details at Chip Pearson's site: HTTP://www.cpearson.com/excel/datetime.htm):³59³ "Excel stores dates and times as a number representing the number of days since 1900-Jan-0, plus a fractional portion of a 24 hour day: ddddd.tttttt . This is called a serial date, or serial date-time."
So if the content is 37936.0 and its format is "date", it's displayed as "11.11.03" or "Nov 2003".

For the following examples let's assume a content of:

A1 => 10.12345 => formatted with 2 decimal digits => 10.12
B1 => 10.12345 => formatted as currency => 10,00 DM
C1 => 11.11.03 => date => 11.11.03

range.text

Text property retrieves the value as it is displayed, as a string. It's readonly, so only for getting values. Because my country settings are "German", floats are displayed with a comma.

sheet.range("a1").text # >> "10,12"
sheet.range("b1").text # >> "10,12 DM"
sheet.range("c1").text # >> "11.11.03"

range.value

This is the result when retrieved with value:

sheet.range("a1").value # >> 10.12345
sheet.range("b1").value # >> "10,1235"
sheet.range("c1").value # >> "2003/11/11 00:00:00"

Retrieves the "internal" value (A1 becomes a float), whereby currency and date are still returned as strings albeit somewhat different than before.

range.value2

According to the Excel documentation value2 behaves just like value, but additionally retrieves currency and dates as doubles (the actual internal content):

sheet.range("a1").value2 # >> 10.12345
sheet.range("b1").value2 # >> 10.12345
sheet.range("c1").value2 # >> 37936.0

Yes, seems to work as advertised.

Setting values

Seems that only "value" is useful here. An integer or float arrives as expected as number in Excel:

sheet.range("a1").value = 1.2345
sheet.range("a1").value = 2

For strings Excel does the same processing that it does when something is interactively entered. It thus depends on how Excel interprets the string:

sheet.range("a1").value = "10.11.2003" # becomes a date
sheet.range("a1").value = "1,2345" # becomes a number (at least with German country settings)

Iterating ranges...

...with each

Ranges can be iterated with each:

sheet.range("a1:a10").each{|cell|puts cell.value}

If the range is a block of cells the iteration goes from left to right, then down one line, and so on:

sheet.range("a1:b5").each{|cell|puts cell.value}

Iterating block of cells by row and output the first cell of each row:

sheet.range("b3:c7").rows.each{|r|puts r.cells(1,1).value}

and by column:

sheet.range("b3:c7").columns.each{|c|puts c.cells(1,1).value}

Result of row.value is an array within an array:

sheet.range("b3:c7").rows.each{|r|puts r.value.inspect} # >> [1.0, 10.0]?

...with activecell

Like moving around an Excel sheet with the cursor. Moving down one cell:

sheet.activecell.offset(1,0).activate

Walking down from the active cell until an empty cell is encountered:

sheet.activecell.offset(1,0).activate while excel.activecell.value

...with an index

range = sheet.range("b3:c7")
noofrows = range.rows.count
(1..noofrows).each{|i|puts range.cells(i,1).value}

Named Ranges

Named ranges make Excel spreadsheets more usable for the end user. To create a named range "myRange":

sheet.names.Add( { 'Name' => 'myRange', 'RefersTo' => 'A2:A216' } )

One problem! This doesn't work. Use a Range object for RefersTo?, not a String:

myRange = sheet.Range( 'A2:A216' )
sheet.names.Add( { 'Name' => 'myRange', 'RefersTo' => myRange } )

How do you use named ranges in ruby? Named ranges are kept in the worksheet as well as the workbook. You may need to check both locations.

Something like the following works for named ranges manually defined by the user:

rangeString = workbook.names( 'Sheet1!myRange' ).Value

1. Remove "=" prefix (e.g. "=Sheet1!$A$2:$A$4")

rangeString = rangeString.slice( 1, rangeString.length - 1 ) if ( rangeString =~ /^=/ ) worksheet.range( rangeString ).value = 'testing...'

Finding Data Regions...

Data don't always start in row 1, column A, the number of columns might not be fixed, and the number of rows is most often variable. There are two handy methods that can help to find that "block of data".

Let's assume, that B3:C7 contain data.

...with CurrentRegion

Given any cell inside the "data block", CurrentRegion?() finds the surrounding contiguous data region and returns its range:

sheet.range("b5").currentregion.address # >> "$B$3:$C$7"

...by "jumping around"

There's a shortcut key in Excel "+", that allows you to jump to the end/start of regions with content. With our example and the cursor in B3, pressing + would jump the cursor to cell B7, pressing that shortcut again would move the cursor to the last line of the spreadsheet to B65536. There's an equivalent method: "End()".

Finding the last row with data:

sheet.range("b3").end(-4121).address # >> "$B$7"

The parameter indicates the direction, the Excel constants are:

xlDown = -4121
xlToLeft = -4159
xlToRight = -4161
xlUp = -4162

Saving to CSV (or other formats)

Note that Excel can show quite a lot of warnings / confirm request. To supress these:

excel.DisplayAlerts = false

Then:

workbook.SaveAs 'myfile.csv', xlCSV

where xlCSV = 6.

Here are some common file formats:

xlCSV=6
xlCSVMac=22
xlCSVMSDOS=24
xlCSVWindows=23
xlCurrentPlatformText=-4158
xlExcel9795=43
xlTextMSDOS=21
xlTextPrinter=36
xlTextWindows=20

See also: ScriptingOutlook, ScriptingAccess

- How about OpenOffice and Ruby scripting? Anything in this area? The examples all seem to rely on

require 'win32ole'

and I believe this will only work on Windows OSes.


Sunday, January 20, 2008

RubyGarden Archives: Scripting Access

Editor's Note: Once upon a time, there was a website named RubyGarden.org, which contained many helpful links and articles. That website has recently disappeared. The following "Scripting Access" article was salvaged from the Google cache and is provided here in its entirety.



Here is a quick example on getting data from a Microsoft Access database.

Recordset with a connection, using JET. This works, but for some reason the memory requirements were large. Using an ODBC source for the same data seems to use almost no memory!

require "win32ole"

conn = WIN32OLE.new("ADODB.Connection")
conn["Provider"] = "Microsoft.Jet.OLEDB.4.0"
conn.Open('c:\ruby\dev\calldata.mdb')

rs = conn.Execute("select date, dialednumber, extension, cost from
callrecords where call = 2 and date >=#01-jan-2005#")
rs.getrows.each do |row|
puts row
end

conn.Close

Recordset without a connection, using JET. This works, but for some reason the memory requirements were large. Using an ODBC source for the same data seems to use almost no memory!

require "win32ole"

rs = WIN32OLE.new("ADODB.recordset")

qry = "select date, dialednumber, extension, cost from callrecords where
call = 2 and date >=#01-jan-2005#"
constr = 'Provider=Microsoft.Jet.OLEDB.4.0;Data
Source=c:\data\calldata.mdb'

rs.open(qry,constr)

rs.getrows.each do |row|
puts row
end

Recordset without connection using an ODBC source. My program took about 28MB mem vs. 39MB for the JET version. The difference is massive when working with a lot of data Requires that you create the ODBC data source!

require "win32ole"

rs = WIN32OLE.new("ADODB.recordset")
qry = "select date, dialednumber, extension, cost from callrecords where
call = 2 and date >=#01-jan-2005#"
rs.open(qry,"DSN=calldata;")

rs.getrows.each do |row|
puts row
end

Here's a function you can drop at the top of your program that alters the definition of Object. It will work for loading access data into any object class that supports "<<" and "transpose" - such as Array. Transpose is required since the recordset comes back with the rows "rotated".

public
def readMDB(qry,dsn)
rs = WIN32OLE.new("ADODB.recordset")
rs.open(qry,"DSN=#{dsn};")
rs.getrows.each do |row|
self << row
end
self.transpose
end

Usage example

qry = "select date, dialednumber, extension, cost from callrecords"
dsn = "DSN=calldata;"

a = []
a.readMDB(qry, dsn)
p a

Question: Is there a way to use paramters on the query like the ones from DBI? E.g.: (note the '?')

select * from callrecords where call = ?



Sunday, January 13, 2008

RubyGarden Archives: Scripting Outlook

Editor's Note: Once upon a time, there was a website named RubyGarden.org, which contained many helpful links and articles. The website has recently dropped off the face of the earth. The following "Scripting Outlook" article was salvaged from the Google cache and is provided here in its entirety.



With some help of the the ScriptingExcel page and the Office helpfiles mentioned there a short script to list your Outlook messages. The script should be pretty easy to expand. One method that was really helpfull in figuring out what is possible is:

OLEObject.ole_methods

MarkJanssen

require 'win32ole'

myApp = WIN32OLE::new("outlook.Application")

# load Outlook OLE constants

class OutlookConst
end

WIN32OLE.const_load(myApp, OutlookConst)

p "OlMailItem = #{OutlookConst::OlMailItem}"

ns = myApp.GetNameSpace("MAPI")
#ns.Logon # uncomment for online usage
folders = ns.Folders

new_messages = 0

folders.each {
| folder |
puts "+" + folder.Name

begin
folder.Folders.each {
| folder |
# puts " " + folder.Name
if ["Inbox","Sent Items"].member? folder.Name
folder.Items.each {
| msg |
if msg['UnRead']
new_messages += 1
end
puts " From: " + msg['SenderName']
puts " Subject: " + msg['Subject']
}
end
}
rescue

puts " Unable to open"
end
}
puts "You have #{new_messages} new message(s)"

Glauber 2003-10-09:

* Here's a quick tip that would have saved me some time: When iterating over a list of Outlook items (e.g.: MailItems) in order to move or delete some of them, you must do so backwards (GetLast... GetPrevious... GetPrevious...). In other words, don't attempt to access an item that's after the one you deleted or moved.
* Here's another tip: there's no good way to find the From Internet email address of a message. Most times, creating a reply and extracting the first recipient for it will work, but sometimes not (it may give you an Exchange X.400 address instead of what you're looking for). Don't send the reply, discard it.


Thursday, January 3, 2008

IronRuby, Ruby.NET, and the Differences

Following on the heels of Pat Eyler's interview of M. David Peterson, Mr. Peterson has posted an article entitled Ruby.NET vs. IronRuby: What's The Difference?.

Microsoft's John Lam has posted a follow-up on his blog.

It's good to see increased public discussion of these two important projects.

Digg my article

Wednesday, January 2, 2008

Parsing Spreadsheets with the Roo Gem

I've talked at length about using the win32ole library to automate Microsoft Excel. But there are alternatives for accessing data in Excel spreadsheets -- some of which don't even require Excel to be installed. One of these is the roo gem, which allows you to extract data from Excel, OpenOffice and Google spreadsheets. Roo provides read-only access to Excel and OpenOffice spreadsheets, but both read and write access to Google spreadsheets.

To install the roo gem, including its dependencies, open a console window and enter:


gem install roo -y

Require the roo library in your script:

require 'roo'

To parse an Excel worksheet, we first create an instance of the Excel workbook object by calling the Excel.new method and passing it the path and filename:

xl = Excel.new('C:\my_workbook.xls')

The next step is to define which worksheet in the workbook we will be working with. We do this by setting the Excel.default_sheet value to one of the worksheets in the Excel.sheets array:

xl.default_sheet = xl.sheets[0]

To extract the value from a particular cell, call the cell method, passing it the row number and either the column number or column letter. Examples:

val = xl.cell(3, 5)
val = xl.cell(3, 'E')

The row method returns an array of values from the specified row number, so...

values = xl.row(3)

...returns the values from the third row.

Similarly, the column method returns an array of values from a column:

values = xl.column(5)

Get the full details from the roo homepage, and the roo Rdoc page.

That's all for now. As always, feel free to post a comment here or email me with questions, comments, or suggestions.

Thanks for stopping by!

Digg my article