The Shadow File: Handling HTTP Redirection in Ruby

I have a Ruby project where I'm dumping a bunch of bookmarks from delicious.com, then fetching each bookmarked page for analysis.

One of the problems I encountered early on is that the some of the web pages bookmarked would redirect to some other location. Simply checking for HTTP response code 200 was insufficient. I needed to check for redirection as well.

A quick Google search for "ruby follow http redirect" yields lots of results. Unfortunately, they're all very similar, and not quite right. In general, the examples you come across (even the one in the official Ruby documentation) don't handle the case when the redirected location is path relative to the original location. So you end up doing a get on a URL that looks like "../../redirected/location/index.html," which clearly won't work.

It turns out that detecting relative redirection is fairly straightforward:


 until( found || attempts>=@@MAX_ATTEMPTS)
     attempts+=1
     http=Net::HTTP.new(url.host,url.port)
     http.open_timeout = 10
     http.read_timeout = 10
     path=url.path
     path="/" if path==""

     req=Net::HTTP::Get.new(path,{'User-Agent'=>@@AGENT})
     if url.instance_of? URI::HTTPS
       http.use_ssl=true
       http.verify_mode = OpenSSL::SSL::VERIFY_NONE
     end
     resp=http.request(req)
     if resp.code=="200"
       break
     end
     if (resp.header['location']!=nil)
       newurl=URI.parse(resp.header['location'])
       if(newurl.relative?)
           puts "url was relative"
           newurl=url+resp.header['location']
       end
       url=newurl

     else
       found=true #resp was 404, etc
     end #end if location
   end #until

The trick here is to ask the redirected url object if it is relative. If it is, then add the redirected path onto the old url object. the URI class overrides the '+' operator (what is this, C++?) so that you can concatenate the new path onto the old URL, by doing:
newurl=url+resp.header['location']

Sunday, March 15, 2009

Handling HTTP Redirection in Ruby