Saturday, March 3, 2007

Ruby, Excel, and SQLite (instead of MS Access)

I used to rely on Microsoft Access for my desktop database needs, but have recently converted to SQLite and its associated Ruby library. SQLite is fast, light, and powerful. Combining SQLite with Ruby gives me the ability to quickly handle most data processing tasks.

Example: Let's say you've been given an Excel workbook with multiple worksheets of data, and you need to cross-reference data between worksheets. In many such cases, you'll want to import the data into a real database and use SQL.

Here's a brief, unpolished snippet of code that reads data from an open Excel workbook and creates an SQLite database with a table for each worksheet in the Excel workbook:


require 'win32ole'
require 'sqlite3'

# Connect to a running instance of Excel
xl = WIN32OLE.connect('Excel.Application')
# Get the active workbook
wb = xl.ActiveWorkbook
# Create the SQLite3 database
db = SQLite3::Database.new('excel.db')
# Create a database table for each worksheet
# in the workbook
wb.Worksheets.each do |ws|
# Grab all values from worksheet into a
# 2-dimensional array
data = ws.UsedRange.Value
# Grab first row of data to use as field names
field_names = data.shift
# Create database table using worksheet name and
# field names
db.execute("CREATE TABLE [#{ws.Name}] \
( #{field_names.join(',')} );")
# For each row of data...
data.each do |row|
# ...single-quote all field values...
row.collect! { |f| f = "'" + f.to_s + "'" }
# ...and insert a new record into the
# database table
db.execute("INSERT INTO [#{ws.Name}] VALUES \
( #{row.join(',')} );")
end
end

Now, you've got your data loaded into a real database, quickly and easily, and can get to work on it... via Ruby or the SQLite command line interface.

I would be happy to provide more info on using SQLite3 with Ruby, but why the lucky stiff gives a pretty good jump-start here.

5 comments:

Walker said...

I appreciate your blog and look forward to future posts. Office applications should be scriptable and it should be easy to do. I am new to this, so I look forward to learning.

Anonymous said...

Hello David,

Thanks for this example; its proved very useful to me. (The rest of your stuff is very interesting as well!)

I tried to use the example code to import a 10k row Excel file but performance was very poor. After reading up on SQLite performance, I put a db.transaction before the 'data.each do' statementokphll and a db.commit after the 'end' of that block. Things sped up dramatically.

Jamis Buck observed the need to use transactions for good performance in this post:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/125840

David Mullet said...

@anonymous:

You make a very good point! When dealing with more than a few hundred records, wrapping your database updates into a transaction can save a tremendous amount of time. I use transactions frequently, yet neglected to mention it here.

Also, if you have no need for permanent disk storage of your database, you can create an in-memory database:

db = SQLite3::Database.new(':memory:')

This is also a big time-saver.

Eric said...

Good stuff here... just getting going on a DB based ap in Ruby, and really love all the good tips available on Blogs like yours. One comment on this (old blog) is that if you are looking at bringing at 10,000+ lines from Excel, you would be far better off getting excel to save the file into csv format, and then using SQLites .import facility, which you can script up in Ruby like so:

system("SQLite3 database \".import '#{filename}' table_name\"")

Chad Lancour said...

dude.

perfect example - wish I could find more complete examples like this for rubyish.

Here is an example for a csv using fastercsv:
db = SQLite3::Database.new(':memory:')

fcsv = FasterCSV.open("file.csv", 'r', :headers => :first_row, :row_sep => :auto, :col_sep => ",")

db.execute("CREATE TABLE [data] ( #{fcsv.shift.headers.join(',')} );")

fcsv.each do |row|
qef = []
row.fields.each { |f| qef << "'" + f.to_s.gsub(/\\/, '\&\&').gsub(/'/, "''") + "'" }
db.execute("INSERT INTO [data] VALUES ( #{qef.join(',')} );")
end