We've talked at length about automating Windows applications through COM/OLE, using the win32ole library. But not all applications expose themselves (so to speak) to such automation. The Windows Script Host can automate the activation of windows, and the sending of keystrokes. This may sometimes be all that you need to get the job done.
The Windows Script Host (WSH) has been part of the Windows operating system since Windows 98. You can use WSH's Shell object (via COM/OLE) to send keystokes to windows.
First, require the win32ole library:
require 'win32ole'
Now we'll create an instance of the Wscript Shell object:
wsh = WIN32OLE.new('Wscript.Shell')
To send keystrokes to a window, you must first activate the window, bringing it to the forefront. This can be done with the Wscript Shell's AppActivate method, which returns true if the window was successfully activated, and false otherwise. The AppActivate method takes the window title text as it's argument:
wsh.AppActivate('Title')
The string passed to the AppActivate method can be a partial, but must be the start or ending of the window title. The method is not case sensitive, and does not accept regular expressions. To quote Microsoft: "In determining which application to activate, the specified title is compared to the title string of each running application. If no exact match exists, any application whose title string begins with title is activated. If an application still cannot be found, any application whose title string ends with title is activated. If more than one instance of the application named by title exists, one instance is arbitrarily activated."
Once you have the window activated, you may use the Wscript Shell's SendKeys method to send keystrokes to the window. The SendKeys method takes a string in quotes. Special keys (ie, ENTER, TAB, PGDN, PGUP, Function Keys) may be embedded in the string, if surrounded by braces:
wsh.SendKeys('Ruby{TAB}on{TAB}Windows{ENTER}')
The SHIFT key is represented by '+', the ALT key is represented by '%', and the CTRL key is represented by '^', so to quit an application by sending ALT-F4:
wsh.SendKeys('%{F4}')
Further details on the syntax of the SendKeys method can be found
here.
Timing is important using these methods, so you may need to insert a sleep method here and there, to get the optimal performance. For example, a 1-second (or less) wait between activating a window and sending keystrokes, or vice-versa.
So, putting it all together, here's a brief example that activates Notepad (which must be running first), inserts text, saves the file to a specific name, and quits Notepad:
# Require the win32ole library:
require 'win32ole'
# Create an instance of the Wscript Shell:
wsh = WIN32OLE.new('Wscript.Shell')
# Try to activate the Notepad window:
if wsh.AppActivate('Notepad')
sleep(1)
# Enter text into Notepad:
wsh.SendKeys('Ruby{TAB}on{TAB}Windows{ENTER}')
# ALT-F to pull down File menu, then A to select Save As...:
wsh.SendKeys('%F')
wsh.SendKeys('A')
sleep(1)
if wsh.AppActivate('Save As')
wsh.SendKeys('c:\temp\filename.txt{ENTER}')
sleep(1)
# If prompted to overwrite existing file:
if wsh.AppActivate('Save As')
# Enter 'Y':
wsh.SendKeys('Y')
end
end
# Quit Notepad with ALT-F4:
wsh.SendKeys('%{F4}')
end
The above code snippet can be improved upon, and I encourage you to do so. But it, hopefully, demonstrates what can be done.
Mimicing keystrokes is certainly not the ultimate in program automation, but it may sometimes be all that you need to get the job done. For example, back in the days before pop-up blockers, I had written a script that would simply run in the background, look for pop-up ads (based on a list of title strings), and close them. Simple, yet effective.
I should probably also mention AutoIt, "a freeware Windows automation language. It can be used to script most simple Windows-based tasks." I've not used it myself, but I believe that the Watir library leverages it.
That's all for now. As always, let me know if you have questions, comments, or requests for future topics.
Thanks for stopping by!