<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Of Spiders and Scrapers: Decomposing Web Pages 101</title>
	<atom:link href="http://www.linux-mag.com/id/7448/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7448/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Fri, 10 May 2013 08:56:11 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: tdbtdb</title>
		<link>http://www.linux-mag.com/id/7448/#comment-6785</link>
		<dc:creator>tdbtdb</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7448/#comment-6785</guid>
		<description>&lt;p&gt;wget example does not work for me. The first grep seems a bit strange,&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
grep \&#039;&lt;br /&gt;
`&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;What would that do that would be good?
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>wget example does not work for me. The first grep seems a bit strange,<br />
<code><br />
grep \'<br />
`</code></p>
<p>What would that do that would be good?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: vijayekm</title>
		<link>http://www.linux-mag.com/id/7448/#comment-6786</link>
		<dc:creator>vijayekm</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7448/#comment-6786</guid>
		<description>&lt;p&gt;Mechanize is good for most of the cases.&lt;/p&gt;
&lt;p&gt;However for interacting with pages with javascript I mostly depend on IE and watir. &lt;/p&gt;
&lt;p&gt;Not sure if there are any better choices out there.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Mechanize is good for most of the cases.</p>
<p>However for interacting with pages with javascript I mostly depend on IE and watir. </p>
<p>Not sure if there are any better choices out there.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: perljunkie</title>
		<link>http://www.linux-mag.com/id/7448/#comment-6787</link>
		<dc:creator>perljunkie</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7448/#comment-6787</guid>
		<description>&lt;p&gt;When I\&#039;m scraping, it\&#039;s very seldom that I find myself needing something other than some text on a page.  For that purpose, I\&#039;ve almost always found the dumped ASCII output of Lynx very convenient and less of a hassle than XPathing and reading node structures.  I have a small Perl module called WebPage.pm that handles all this and can recursively spider down through pages (using the references list in the Lynx output) if desired.  Most of the time, it works great.&lt;/p&gt;
&lt;p&gt;-pj
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>When I\&#8217;m scraping, it\&#8217;s very seldom that I find myself needing something other than some text on a page.  For that purpose, I\&#8217;ve almost always found the dumped ASCII output of Lynx very convenient and less of a hassle than XPathing and reading node structures.  I have a small Perl module called WebPage.pm that handles all this and can recursively spider down through pages (using the references list in the Lynx output) if desired.  Most of the time, it works great.</p>
<p>-pj</p>
]]></content:encoded>
	</item>
</channel>
</rss>