dcsimg

Searching with POE and IRC

Watch IRC channels with Perl and the POE module.

This month’s “Perl of Wisdom” is
my 237th article for various magazines, including 85 monthly
articles for this magazine alone.

Over the years, I’ve avoided overlap as much as I can
muster. Because of this diversity, it’s hard to find a
beginner or intermediate Perl topic that I
haven’t already spoken at least a bit about. (And this
clearly makes me the leader for in-print authorship for Perl
writings, at somewhere around 20 million
bylines total.)

And thanks to the generous publishers I’ve had over the
years, all 237 (and counting) magazine articles are online on my
Web site, where you can examine them for free. The hard part is
getting the word out about this resource. Oh, sure, I have a
“search this site” box on my site, hoping that people
take advantage of relative keywords. But IRC
bots seem all the rage right now, and I thought a bot would be an
effective and novel way to find my material.

Yahoo! for Search

I recently stumbled across the "c">Yahoo::Search module, which performs programmatic web
searches using the decently-sized and speedy Yahoo search engine.
Now, I was previously familiar with "c">Net::Google for Google searches, but I’ve switched
to using Yahoo search for a few reasons.

*One advantage of Yahoo search
for my columns is that I can actually search for module names, such
as Class::DBI. Google’s search for
words containing colons has been broken for about a year now,
making it hard to look for Perl modules referenced in my columns
(or anywhere, for that matter).

*Another big advantage of
programmatic Yahoo search is that I don’t need to go through
a formal process to get a (very) private Google API key. Instead, I
can just make something up! Thank you, Yahoo, for making it easier
for us!

*And finally, to a reasonable
degree, the results from the Yahoo API’s have recently had
the “no commercial use” clause removed from its
acceptable use policy. This means that I can legally use the
information in the pursuit of fortune as well as fame. For example,
I have a friend who has a real estate site and finds the closest
branch office to an address using Yahoo’s geolocation service
— for free. Nice. Wake up, Google! Yahoo is slipping past you
here.

So, I thought I’d show off Yahoo search, promote my
columns, demonstrate the latest changes to "c">POE::Component::IRC for bot building, and illustrate
POE::Session::Attribute for easy session
authorship, all in one little “mash-up,” as the kids
like to say these days. And with that flourish, allow me to present
my search_merlyn_text bot, shown in
Listing One.

LISTING ONE:
The POE- based
IRC bot
001 #!/usr/bin/perl
002 use strict;
003
004 ## CONFIGURATION
005
006 my $SERVER = "irc.perl.org";
007 my $NICK = "search_merlyn_text";
008 my @CHANNELS = qw(#search_merlyn_text #search_merlyn_text2);
009
010 ## END CONFIGURATION
011
012 use POE;
013
014 BEGIN {
015   package MyBot;
016
017   use POE qw(Component::IRC);
018   use base POE::Session::Attribute::;
019
020   use Yahoo::Search AppId => ’YahooDemo’;
021
022   sub full_to_nick { (shift =~ /(.*?)!/)[0] }
023
024   sub _start : Package {
025     my ($heap) = @_[HEAP];
026     my $irc = $heap->{irc} = POE::Component::IRC->spawn;
027   };
028
029   sub irc_registered : Package { # client is ready to connect
030     my ($sender) = @_[SENDER];
031     my $irc = $sender->get_heap;
032     ## warn "trying to connect";
033     $irc->yield(connect => {server => $SERVER, nick => $NICK});
034   };
035
036   sub irc_255 : Package {       # server is done blabbering
037     my ($sender) = @_[SENDER];
038     my $irc = $sender->get_heap;
039     ## warn "trying to join @CHANNELS";
040     $irc->yield(join => $_) for @CHANNELS;
041   };
042
043   sub irc_join : Package {      # server says we joined
044     my ($sender) = @_[SENDER];
045     my $irc = $sender->get_heap;
046
047     my ($who, $channel) = @_[ARG0..ARG1];
048     $who = full_to_nick($who);
049
050     ## warn "$who joined $channel";
051     if ($who eq $NICK) {
052       $irc->yield(privmsg => $channel => "Hello! I search merlyn’s columns at http://www.stonehenge.com/merlyn/columns.html!");
053     }
054   };
055
056   sub irc_public : Package {    # public message in a channel
057     my ($sender) = @_[SENDER];
058     my $irc = $sender->get_heap;
059
060     my ($who, $channels, $message) = @_[ARG0..ARG2];
061     $who = full_to_nick($who);
062     ## warn "$who said $message in @$channels";
063
064     if ($message =~ /^\Q$NICK\E(?:,|:)\s*(.*)/) {
065       my $search = $1;
066       $_[KERNEL]->yield(search => $irc, $channels, $search, "$who: ");
067     }
068   };
069
070   sub irc_msg : Package {       # private message to me
071     my ($sender) = @_[SENDER];
072     my $irc = $sender->get_heap;
073
074     my ($who, $me, $message) = @_[ARG0..ARG2];
075     $who = full_to_nick($who);
076     ## warn "$who said $message to me";
077     $_[KERNEL]->yield(search => $irc, [$who], $message);
078   };
079
080   sub irc_ctcp_action : Package {       # public emote
081     my ($sender) = @_[SENDER];
082     my $irc = $sender->get_heap;
083
084     my ($who, $channels, $message) = @_[ARG0..ARG2];
085     $who = full_to_nick($who);
086     ## warn "$who *$message* in @$channels";
087   };
088
089   sub search : Package {        # time for us to search
090     my ($irc, $channels, $search, $prefix) = @_[ARG0..ARG3];
091     ## warn "@$channels wants to see $prefix [results of $search]";
092
093     my @results;
094
095     for my $result (Yahoo::Search->Results
096                     (Doc => "site:stonehenge.com $search", Count => 20)) {
097       next unless $result->Url =~ /col\d\w*\.html$/;
098
099       ## return results as mediawiki-like links
100       push @results, sprintf "[%s %s]", $result->Url, $result->Title;
101       last if @results >= 3;
102     }
103     @results = "Nothing.  Please let Randal know of your great column idea!"
104       unless @results;
105
106     $prefix ||= "";
107     $irc->yield(privmsg => $channels => "${prefix}$search: @results");
108   };
109
110   sub _default : Package {
111     our $IGNORE_THESE ||= { map { $_ => 1 }
112                             qw(_child
113                                irc_plugin_add irc_isupport
114                                irc_snotice irc_connected
115                                irc_mode irc_ping irc_part
116                                irc_001 irc_002 irc_003 irc_004 irc_005
117                                irc_250 irc_251 irc_252 irc_254 irc_255
118                                irc_265 irc_266
119                                irc_353
120                                irc_366
121                                irc_372 irc_375 irc_376
122                               ) };
123     return if $IGNORE_THESE->{$_[ARG0]};
124     printf "%s: session %s caught an unhandled %s event.\n",
125       scalar localtime(), $_[SESSION]->ID, $_[ARG0];
126     print "The $_[ARG0] event was given these parameters: ",
127       join(" ", map({"ARRAY" eq ref $_ ? "[@$_]" : "$_"} @{$_[ARG1]})), "\n";
128     0;                        # false for signals
129   };
130 }
131
132 MyBot->spawn;
133
134 $poe_kernel->run;

Talking About Bot

Line 1 points the script at the right
Perl program. Line 2 enables the usual
compiler restrictions.

Lines 6 through 8
give a bit of configuration information. It’s probably not
flexible enough for general use, but close enough for my initial
testing. Line 6 names the IRC server that
the bot connects to. Line 7 is the
irc nick for the bot to use. "i">Line 8 is a list of channels that the bot attempts to
join to answer public queries.

Line 12 pulls in the "c">POE module (found in the CPAN).
The only symbol I’m using from this import is "c">$poe_kernel.

Lines 14 to 130
create my bot, as if it were a separate file that I brought in with
use MyBot. By placing the package inside a
block, I limit any lexical variables (which would have been
file-lexical variables) to that package definition. Furthermore, by
making it a BEGIN block, any runtime
statements are processed before the rest of the file is processed.
This is useful if you have any fake exports of the form…

*main::some_exported = \&some_exported;

… which exports some_exported to
the main package, allowing it to be called
and not be treated as a bareword.

Line 17 again imports "c">POE, as well as "c">POE::Component::IRC, including the state handler offsets
like HEAP and "c">ARG0.

Line 18 makes this a "c">POE::Session subclass, but also scans each subroutine
header for an attribute that can tag the subroutine automatically
as a state for the session. POE::Session has
been invaluable on recent projects, in the mindset of
“don’t repeat yourself.” (It always seemed silly
to me that you have to say the name of everything twice;
POE::Session::Attribute puts it back into
“once is enough” mode.)

Line 20 pulls in "c">Yahoo::Search, using a generic application identifier of
YahooDemo. If you use this module, read the
restrictions at "story_link">http://developer.yahoo.com, including how to
select and register your own application identifier. But really,
you could just make up a unique name and the Yahoo API accepts it.
Yahoo is very unpicky, and therefore very friendly.

Line 22 defines a utility subroutine to
convert the nick!user@host.example.com return value from
some of the operations to just nick. Note
that this is not a session state handler,
because there are no attributes assigned.

Lines 24 to 27
define the handler for the _start state,
automatically generated by POE when the
session begins. This is where I create the "c">POE::Component::IRC object, and stuff it into the heap
so that it stays alive as long as this session does. Creating the
object also causes a register
‘all’
request to be sent, which alerts the to
every possible IRC event.

Once the IRC object is operational, the "c">irc_registered event is triggered, transferring flow to
the subroutine in Lines 29 through
34. As a matter of formula, for every event
triggered by the IRC object, I grab the sender ( "i">Line 30) and get the IRC object back ( "i">Line 31). By doing so, I can use this same code with
multiple IRC objects in one program and it does the right thing,
sending the response back to the right object. Line
33
tells the IRC object to try to connect to the requested
server with the preferred nick.

Presuming the connection worked, a “255” event will
arrive at some point in the near future, which basically says that
the server is done saying all of its initial login stuff, and
you’re now ready to go. Event “255” is trapped in
Lines 36 to 41.
Line 40 has the bot join all of the
interesting channels.

As the bot joins each channel, you get an "c">irc_join event, which is handled in "i">Lines 43 to 54. You also get this
event if someone else joins the channel, but let’s ignore
those events right now. However, when the bot joins a channel, it
announces its intentions in Line 52.

From this point on, everything is all merely reactions to being
spoken to. For example, if the bot gets a public message in a
channel, an irc_public event occurs, which
is handled in Lines 56 to "i">68. If the bot gets a private message, an "c">irc_msg event fires instead, handled in "i">Lines 70 to 78. In either case,
the code has to check if it’s time to do a search. If so, the
bot performs the search and returns the results.

For the public event, Line 60 extracts
the speaker, the channels, and the text of the message. If the
message starts with the bot’s name followed by a colon or
comma (Line 64), the code presumes the
speaker is talking directly to the bot, so it performs a search,
returning the response in the same public channel. This is done by
yielding to the search state in the
bot’s own session, including a prefix
that identifies the speaker who triggered the search, in case
multiple searches are fired off in a short period of time.

A private message is a search, so matching can be skipped. Also,
since the bot is to address the person directly, you can skip the
prefix. Thus, the yield in Line 77 is a bit
shorter. The nick is placed inside an array ref constructor so the
same interface can be used for both calls.

I’ll get to the search handling in a minute, but as an
aside, I also decided to decode the emotes,
such as /me is stepping out for a moment in
Lines 80 to 87.
I’m not actually doing anything with them at the moment, but
maybe a future version of the bot will pay attention to emotes as
well.

Finally, the meat of the program: the search is performed in
Lines 89 to 108.
Line 90 just grabs the passed-in parameters.
Line 93 creates an array to hold the
responses — three responses at most. Lines
95
and 96 perform the Yahoo search,
using a site-restricted search for "c">stonehenge.com, plus whatever text the user specified.
The code asks for twenty results. Why twenty results when the bot
only provides three? Because you can’t directly ask Yahoo for
“just the magazine columns”, some additional filtering
is necessary, based on the returned URL. That filtering is
performed in Line 97.

Arriving at Line 100 implies a good URL.
The title and URL are wrapped in brackets and are placed into
@results. At three results, the loop aborts
in Line 101.

If, for some reason, no results are found, Lines
103
to 104 change the result to
suggest to the user that they inform me of their search, so I can
write an article about that. (I’m always looking for new
ideas!) (Please.) (I’m not kidding.)

Line 107 returns the results back to the
user in response to a private message, or to the channel (s) for a
public message, along with a prefix if needed.

And that’s the bot. Lines 110 to
129 establish a "c">_default handler for debugging purposes. Because the bot
has registered to receive all IRC events,
most events will likely be irrelevant. However, it’s nice to
know what the events are so I can add some hook into them if I
want. The code in Lines 124 to "i">127 dumps out the event, including expanding any
arrayrefs to array values for easy parsing.

Initially, the exception list in Lines
113
to 121 was empty. As I ran the
program repeatedly, though, I saw events being triggered that were
either interesting or ignorable. For interesting events, I
immediately created a handler. For ignorable events, I added the
event type to the block list. And thus, my program grew a bit at a
time, helping me understand what I needed to do.

All that’s left to do is create a bot instance and run it.
Lines 132 and 134 do
exactly that. Because there’s no trigger to provoke the bot
to stop, this program runs until terminated from the
command-line.

A Bot More

After I submit this column, I’ll probably tinker with this
bot a bit more to get it to handle nick collisions, being kicked
from a channel, being invited to a channel, and so on. But for now,
in a dozen lines of code, I’ve got a working bot that helps
promote one of my many public contributions to the Perl community.
Hope you find what you’re looking for… until next
time, enjoy!

Randal Schwartz is Chief Perl guru at Stonehenge
Consulting. You can reach Randal at "mailto:merlyn@stonehenge.com?subject=Perl%20of%20Wisdom:%20Searching%20with%20POE%20and%20IRC"
class="emailaddress">merlyn@stonehenge.com
.

Comments are closed.