Wednesday, May 30, 2007

Automating Applications with Ruby & The Windows Script Host

We've talked at length about automating Windows applications through COM/OLE, using the win32ole library. But not all applications expose themselves (so to speak) to such automation. The Windows Script Host can automate the activation of windows, and the sending of keystrokes. This may sometimes be all that you need to get the job done.

The Windows Script Host (WSH) has been part of the Windows operating system since Windows 98. You can use WSH's Shell object (via COM/OLE) to send keystokes to windows.

First, require the win32ole library:

require 'win32ole'

Now we'll create an instance of the Wscript Shell object:

wsh ='Wscript.Shell')

To send keystrokes to a window, you must first activate the window, bringing it to the forefront. This can be done with the Wscript Shell's AppActivate method, which returns true if the window was successfully activated, and false otherwise. The AppActivate method takes the window title text as it's argument:


The string passed to the AppActivate method can be a partial, but must be the start or ending of the window title. The method is not case sensitive, and does not accept regular expressions. To quote Microsoft: "In determining which application to activate, the specified title is compared to the title string of each running application. If no exact match exists, any application whose title string begins with title is activated. If an application still cannot be found, any application whose title string ends with title is activated. If more than one instance of the application named by title exists, one instance is arbitrarily activated."

Once you have the window activated, you may use the Wscript Shell's SendKeys method to send keystrokes to the window. The SendKeys method takes a string in quotes. Special keys (ie, ENTER, TAB, PGDN, PGUP, Function Keys) may be embedded in the string, if surrounded by braces:


The SHIFT key is represented by '+', the ALT key is represented by '%', and the CTRL key is represented by '^', so to quit an application by sending ALT-F4:


Further details on the syntax of the SendKeys method can be found

Timing is important using these methods, so you may need to insert a sleep method here and there, to get the optimal performance. For example, a 1-second (or less) wait between activating a window and sending keystrokes, or vice-versa.

So, putting it all together, here's a brief example that activates Notepad (which must be running first), inserts text, saves the file to a specific name, and quits Notepad:

# Require the win32ole library:
require 'win32ole'
# Create an instance of the Wscript Shell:
wsh ='Wscript.Shell')
# Try to activate the Notepad window:
if wsh.AppActivate('Notepad')
# Enter text into Notepad:
# ALT-F to pull down File menu, then A to select Save As...:
if wsh.AppActivate('Save As')
# If prompted to overwrite existing file:
if wsh.AppActivate('Save As')
# Enter 'Y':
# Quit Notepad with ALT-F4:

The above code snippet can be improved upon, and I encourage you to do so. But it, hopefully, demonstrates what can be done.

Mimicing keystrokes is certainly not the ultimate in program automation, but it may sometimes be all that you need to get the job done. For example, back in the days before pop-up blockers, I had written a script that would simply run in the background, look for pop-up ads (based on a list of title strings), and close them. Simple, yet effective.

I should probably also mention AutoIt, "a freeware Windows automation language. It can be used to script most simple Windows-based tasks." I've not used it myself, but I believe that the Watir library leverages it.

That's all for now. As always, let me know if you have questions, comments, or requests for future topics.

Thanks for stopping by!

Digg my article

Thursday, May 24, 2007

Launching Apps and Printing Docs with the Windows Shell

A reader recently asked how to launch an application from within a Ruby script. A quick answer is to use the system or exec methods. But you can also leverage the Windows Shell to launch applications, and have control over the window state. You can also use the shell to print documents. Let's get right down to it, shall we?...

Require the win32ole library...

require 'win32ole'

Create an instance of the Windows Shell object...

shell ='Shell.Application')

The shell object's ShellExecute method performs a specified operation on a specified file. The syntax is...


FILE: Required. String that contains the name of the file on which ShellExecute will perform the action specified by OPERATION.

ARGUMENTS: Optional. The parameter values for the operation.

DIRECTORY: Optional. The fully qualified path of the directory that contains the file specified by FILE. If this parameter is not specified, the current working directory is used.

OPERATION: Specifies the operation to be performed. It should be set to one of the verb strings that is supported by the file (Examples: 'open', 'edit', or 'print'). If this parameter is not specified, the default operation is performed.

SHOW: Recommends how the window that belongs to the application that performs the operation should be displayed initially (0 = hidden, 1 = normal, 2 = minimized, 3 = maximized). The application can ignore this recommendation. If this parameter is not specified, the application uses its default value.

So, to launch Excel in a maximized window...

shell.ShellExecute('excel.exe', '', '', 'open', 3)

I suppose you could also launch your rails app with something like this...

shell.ShellExecute('ruby.exe', 'c:\my_rails_app\script\server', '', 'open', 1)

To print a document, hiding the application window...

shell.ShellExecute('C:\MyFolder\Document.txt', '', '', 'print', 0)

That's about it. As always, post a comment here or send me email if you have questions, comments, or would like to request a topic for discussion.

Thanks for stopping by!

Tuesday, May 22, 2007

The Shell Windows Collection of Internet Explorer Objects

As mentioned previously, you cannot use the WIN32OLE.connect method to connect to a running instance of Internet Explorer, as you would do, for example, with Excel or Word. I'll explain now how to do this via the Windows Shell.

First, here's the code snippet, which grabs an instance of IE that has this blog displayed...

for window in'Shell.Application').Windows
if window.Document.Title =~ /Ruby on Windows/
ie = window

The Windows Shell object includes a Windows method which returns a collection of all of the open windows that belong to the Shell...

shell ='Shell.Application')
windows = shell.Windows

This is actually a collection of Internet Explorer objects, though some of these may be Internet Explorer web browser windows and others may be Windows Explorer windows.

To get a count of the number of windows, call the Count method...


To reference a member of this collection by index, call the Item method, passing it the (zero-based) index...

first_window = windows.Item(0)

Internet Explorer windows will normally have a Document object, which will have a Title property, so to find the IE window that you want to work with, iterate over the Windows collection and check the Document.Title value...

for window in windows
if window.Document.Title =~ /Ruby on Windows/
ie = window

...and now you have your IE application object to work with as previously discussed.

Note that not all Shell Window objects will have a Document object, so (in the example above) wrapping the code within a begin... rescue... end block would handle the error that occurs with non-IE windows.

Make sense? Let me know if you have questions or comments, and thanks for stopping by!

Sunday, May 20, 2007

Automating Internet Explorer with Ruby: Without Watir

In a previous post, I discussed how to use the watir library to automate Microsoft Internet Explorer. I should also mention that you can also do this directly, using the win32ole library. Let's take a look at how to do this...

First, require the win32ole library...

require 'win32ole'

Create a new instance of Internet Explorer...

ie ='InternetExplorer.Application')

(NOTE: You cannot use the WIN32OLE.connect method to connect to a running instance of Internet Explorer, as you would do, for example, with Excel or Word. I'll explain later how to do this via the Windows Shell. Stay tuned...)

You can show or hide the IE window by setting the Visible property...

ie.Visible = true

Navigate to a URL using the Navigate method...


Wait until IE has completed loading the page, by checking the ReadyState property...

sleep(1) until ie.ReadyState == 4

When a web page is loaded into IE, the contents of the page are represented by the Document object. The document object's All method returns a collection of all elements within the document, such as input textboxes, links, buttons, etc. You may reference a document element by name, which you will find in the HTML code for that page. For example, viewing the source for the top page shows us, among others things, that we have a textbox named 'q' and a button named 'btnG'...

input maxlength=2048 name=q size=55 title="Google Search" value=""
input name=btnG type=submit value="Google Search"

Let's enter a value into the textbox named 'q' by setting its Value property...

ie.Document.All.q.Value = 'ruby on windows'

...and then click the button named 'btnG', by calling its Click method...

...and wait until IE has completed loading the page...

sleep(1) until ie.ReadyState == 4

The Document.All.Tags method returns a collection of all elements with the specified HTML tag, such as 'a', 'tr', or 'input'. So, to get a collection of all links, you could do this...

links = ie.Document.All.Tags('a')

You could then iterate over this collection...

for link in links do
puts link.InnerText # print the link's text
puts link.href # print the link's URL

To quit IE, call the application object's Quit method...


There is much more to automating IE than I what I have presented here, but this will hopefully get you started. As always, feel free to post a comment here or email me if there is a specific task or topic that you would like to see discussed here.

This is all about helping you make better use of Ruby on Windows.

Thanks for stopping by!

Digg my article

Wednesday, May 16, 2007

Adding Sound to Your Ruby Apps

Have you ever thought about including sounds in your Ruby application? Used sparingly, sound may enhance your applications by adding audio cues or a custom touch. You could, for example, play a beep or chime that announces the completion of a lengthy process. Perhaps a humorous sound to accompany an error message .

The win32-sound library makes using sounds really simple.

If you installed Ruby with the One-Click Ruby Installer, then you probably already have the win32-sound library installed, along with other win32 utilities. Otherwise, you can install it in seconds via gem. Open up a console window and enter..

gem install win32-sound

To use the win32-sound library, add these require and include statements to the top of your script...

require 'win32/sound'
include Win32

Then, to play a sound file on the PC, call the method, passing it the name of the file...'chimes.wav')'c:\sounds\hal9000.wav')

In the example above, you don't have to include the path to the file 'chimes.wav', because 'chimes.wav' is usually installed in the Windows folder. But in most cases, you'll want to include the full path to the sound file.

To generate a simple beep, call the Sound.beep method, passing it the tone frequency (in Hertz, between 37 and 32767) and the duration in milliseconds. For example, to play a low tone for half a second...

Sound.beep(100, 500)

...or to play an annoyingly high-pitched tone for 3 full seconds...

Sound.beep(5000, 3000)

The complete docs for this library can be found here.

Distributing your sound files with your application is also simple. You can embed your sound files in an executable created with RubyScript2Exe, or include them in an install package produced with Inno Setup.

Be careful not to overdo it, though, if your program is to be used by others besides yourself. It's a fine line between clever and annoying.

Thanks for stopping by!

Tuesday, May 15, 2007

FAQ: "script.rb" versus "script.rbw"

What's the difference between ".rb" and ".rbw" files?

As you probably know, Windows associates certain filename extensions with certain programs. For example, double-click on a file with the ".xls" filename extension, and the file will be opened in Microsoft Excel (if it is installed). Use the One-Click Ruby Installer, and your ".rb" and ".rbw" files will be associated with the Ruby interpreter, so that double-clicking on a Ruby script automatically runs the script in the Ruby interpreter.

But there are two Ruby interpreters installed on Windows, ruby.exe and rubyw.exe. ruby.exe is the standard interpreter, which runs your script in a command/console window. rubyw.exe is essentially the same as ruby.exe, but without the console window, so any output to the console (ie, 'puts' statements or error messages) will not be seen.

Files with a ".rb" filename extension are associated with ruby.exe, while files with a ".rbw" filename extension are associated with rubyw.exe. Double-click on a ".rb" file in Windows Explorer and the script will open up a console window -- while it is running. The console window will automatically close when the script stops running. Double-click on a ".rbw" file in Windows Explorer and the script will run, but with no console window (and therefore no error messages).

To ensure you see the necessary console output from your script, run it from a command prompt, or from an editor/IDE that includes an output window.

For more answers to Frequently-Asked Questions, check out posts with the faq tag.

Thanks for stopping by!

Monday, May 14, 2007

FAQ: But don't I need an IDE?

No. A good lightweight text editor is all you'll need to start developing serious applications. I recommend that you start out with a lightweight editor, so that you can first get comfortable with the language (which won't take long). But as you become more comfortable using Ruby and your projects grow in complexity, you may wish to investigate a full-blown Integrated Development Environment (IDE).

When you do go looking for an IDE, you'll now have several to choose from...

Ruby in Steel allows you to develop Ruby in Microsoft's Visual Studio IDE, while Aptana integrates the former RadRails for developing Ruby on the Eclipse platform. ActiveState's Komodo IDE is the big brother of the Komodo Editor. Borland's CodeGear division (the Delphi people) just announced plans to jump into the Ruby IDE market later this year.

As usual, Google for further details and opinions on this topic.

For more answers to Frequently-Asked Questions, check out posts with the faq tag.

Thanks for stopping by!

Sunday, May 13, 2007

Ruby on Windows FAQs

This series of posts is an attempt to consolidate and address some of the most frequently asked questions (FAQs) regarding using Ruby on Windows. You'll find these FAQ posts located under the faq tag.

There's no doubt more questions to be asked and answers to be provided. Please post a comment here or send me email.

Thanks for stopping by!

FAQ: What text editor should I use?

Any text editor -- even Windows Notepad -- will do, so long as you can save your documents as straight text. You'll eventually want something a bit more robust than Notepad, such as the SciTE code editor that is installed by the One-Click Ruby Installer. More fully-featured code editors include the e Text Editor and ActiveState's Komodo Editor. Google for further details and recommendations.

For more answers to Frequently-Asked Questions, check out posts with the faq tag.

FAQ: How do I install new code libraries?

The One-Click Ruby Installer includes RubyGems. RubyGems is a standard format for distributing Ruby programs and libraries, and provides an easy-to-use tool for managing the installation of gem packages. To install a new library such as watir via gem, open a command window and enter...

gem install watir

To uninstall, enter (you probably guessed this by now)...

gem uninstall watir

For complete details, check out the easy-to-read RubyGems User Guide.

FAQ: How do I get to a command prompt?

Click your Windows Start button, then select Programs, then Accessories, then Command Prompt. Or click the Start button, select Run, type in 'cmd' and click OK. If you're not very familiar with the Windows command prompt, I recommend this brief but informative overview.

FAQ: How do I install Ruby on Windows?

The One-Click Ruby Installer should meet your needs for a painless Ruby installation that includes most of the Ruby libraries you'll need starting out. It includes the SciTE Text Editor (my code editor of choice) and the WIN32OLE library, essential for COM automation.

For more answers to Frequently-Asked Questions, check out posts with the faq tag.

Sunday, May 6, 2007

Automating the Windows Shell with Ruby

The Microsoft Windows Shell provides a set of objects and methods that allow you to automate the Windows Shell with Ruby. You can use these objects and methods to access many of the Shell's functions.

Let's start with an example involving accessing your CD-ROM drive...

As usual, we'll start by requiring the win32ole library...

require 'win32ole'

Next, we'll create an instance of the Windows Shell object...

shell ="Shell.Application")

Now, we'll call the shell object's NameSpace method to obtain a reference to the "My Computer" folder...

my_computer = shell.NameSpace(17)

The value passed to the NameSpace method represents a special folder ("My Computer" = 17).

To obtain a reference to the drive object, we'll call the NameSpace object's ParseName method, passing it the drive letter string for the CD-ROM drive...

cdrom = my_computer.ParseName("E:\\")

Shell objects such as drives and folders have a collection of Verbs that can be called upon. We can see the list if verbs available for a drive by iterating over the drive's Verbs collection and printing out the Name value...

cdrom.Verbs.each do |verb|
puts verb.Name

The list of verbs may vary depending on the type of disc in the drive, but you may see something like this...

&Use with DLA
S&haring and Security...
Scan with &AVG Free
Create &Shortcut

Note the ampersand (&) in these verb names, which represent the context menu shortcut keys.

To perform an action represented by a Verb, locate the verb by Name, then call that Verb's doIt method. So, to eject your CD-ROM drive, you can do this...

cdrom.Verbs.each do |verb|
verb.doIt if verb.Name == "E&ject"

Putting it all together, we could whip up a little CdRom class that encapsulates such functionality...

class CdRom

attr_accessor :drive, :drive_letter, :verbs

def initialize(drive_letter)
my_computer = 17
@drive_letter = drive_letter
sh ="Shell.Application")
@drive= sh.NameSpace(my_computer).ParseName("#{@drive_letter}")
@verbs = []
@drive.Verbs.each do |verb|
@verbs << verb.Name if verb.Name != ''

def invoke_verb(verb_name)
@drive.Verbs.each do |verb|
verb.doIt if verb.Name== verb_name

def eject

def open

def explore

def play


...which could be used like this...

cd ='d:\\')
puts cd.verbs

A tip of the hat goes to Masaki Suketa, who informed me (via the comp.lang.ruby group) that the standard InvokeVerb method does not currently work in the win32ole library, and to use the verb.doIt method instead.

That's all for now. As always, feel free to comment here or email me if you have special requests.

Thanks for stopping by!

Thursday, May 3, 2007

Automating Internet Explorer with Ruby: Watir

A reader has asked about automating Internet Explorer (IE) with Ruby. While you can do this using the win32ole library, I strongly recommend using the watir library instead.

WATIR (pronounced "water") stands for "Web Application Testing in Ruby". Watir is "a free, open-source functional testing tool for automating browser-based tests of web applications." Though it's designed for driving IE to test web applications, Watir is also very handy for automating IE in non-test scenarios. It leverages the win32ole library under the hood, but provides objects and methods that simplify the process.

To use Watir, first install the watir gem. Open up a Command Prompt and type the following...

gem install watir

...then, in your script, require the watir library...

require 'watir'

To create an instance of IE and navigate to a website...

ie =

To click on a link..., "win32ole").click, '').click

To input text into a text field...

ie.text_field(:name, 'login').set('MyLogin')

To select an option from a drop-down list...

ie.select_list( :name , "HallOfFame").select("Dave Concepcion")

To click a button...

ie.button(:value, 'Submit').click

To close the browser, call the close method...


That's the basics. For further details and examples, you'll definitely want to check out the Watir User Guide.

For the complete Watir API reference, check out the rdocs here.

Various extensions for Watir have popped up, including WatirRecorder, WatirMaker, and WET Web Tester. Google for further details and updated links.

As always, post a comment here or send me an email with further questions or to request a topic for discussion.

Thanks for stopping by!

Digg my article