screeley.com

Django and robots.txt

Jan.19

In the spirit of SEO, Django and this site I took the next step and added a robots.txt file. Nothing too exciting here, but I saw some postings on how to do it the wrong way that made me sad. Like serving the robots.txt file via Django. Why are you adding the overhead of django/mod_python to serve a static file?

Wrong Way:

Creating a view and having django render the template.

from django.shortcuts import render_to_response

def robots(request):
    return render_to_response('robots.txt', 
                   mimetype = 'text/plain')

The Right Way:

Have Apache serve the file directly. This site is hosted on WebFaction so you would have to load mod alias and point it at the path to your robots.txt

LoadModule alias_module modules/mod_alias.so alias /robots.txt /full/path/to/robots.txt <Location "/robots.txt"> SetHandler None </Location>

My robots.txt file only has two lines, the first means it pertains to all robots, the second means to avoid the comments path. By default Django comments needs to be in your root url conf to work correctly, but it shouldn't be indexed.

User-agent: * Disallow: /comments/

Comments

Or you could use a comfortable app with admin interface and let caching handle the overhead.

http://code.google.com/p/django-robots/ has a ROBOTS_CACHE_TIMEOUT setting.

@mb0. 'comfortable'? If you have a complex set of rules that you have to maintain with restricted access to the apache config, yes the django-robots project would be beneficial. I personally don't see the need for all that code to create 2 lines of text. Just me?

Horses for courses!

Besides, some people enjoy leaving lots of fat, low-hanging fruit for their successor to trim away :)

I'm loving the project I'm working on - the previous developer would get a list of objects out of the database by using Model.objects.all(), then iterating over the list to pick the ones they wanted. One web page took something like 4000 queries to assemble!

Some people will be quite comfortable managing everything from the point-and-click safety of the admin interface. Other people are prepared to fire up Terminal, SSH in to the web server, and edit the config file of the Apache server.

In some environments, modifications to the web server configuration require fighting your way through two days' worth of red tape.

In other environments, the entire web site site behind an accelerator/reverse-proxy - so it doesn't matter if the robots.txt is served as static text or generated by a Django template - it's going to be fetched from a web server once, then served from the cache until the TTL expires.

I agree with your opinion though - serving a relatively static page through Django templates doesn't really make sense when the file is two lines of text that won't change.

Agreed, nothing stops you from just using a static robots.txt file. I think it's even the better way for personal sites like yours.

The reason I originally wrote django-robots though was the ability to add entries from the admin interface -- something which is incredible useful in environments where editors and content managers don't have access to the file system of the server (and also shouldn't :). The sysadmin was quite happy about not getting requests like "please put that line in the robots.txt file" anymore :)

Cheers, jezdez

Quite inspiring,

looks pretty easy aswell, as you have laid it out in such a way, great work, keep it up

Thanks for bringing this up

maybe you are right. but how often robots.txt is actually accessed? and how much overhead there is?

I'm curious - quantitatively - how big of a deal is this issue?

Thanks for this, unbelievable our developer has a robots no follow tag on our site, no wonder it wasn't being found by the search engines !

I'm a developer out of San Francisco CA working at a startup.

This space will deal with the work I've participated in using the Django framework to build applications for enterprise clients.

Finally, you should follow me on twitter.

Ruminations

  • "GobgoplebeM <a href=http://posterous.com/people/4SDzppk18fMR>сиалис цены</a> undilyday"
    at 3:24a.m. Sept. 6, 2010 | permalink

  • "generic z-pak <a href=http://sefsa.org>buy azithromycin</a>"
    at 7:53p.m. Aug. 27, 2010 | permalink

  • "How do i come up with cash from online gambling? <img>http://shrtn.info/smile/ref.php</img>"
    at 2:50a.m. Aug. 25, 2010 | permalink

  • "http://needman.ru замуж за иностранца <a href=http://needman.ru>знакомства с иностранцами</a>"
    at 12:59p.m. May 18, 2010 | permalink

  • "Yebhewjw <a href="http://yebhewjw.de">yebhewjw</a> http://yebhewjw.de yebhewjw http://yebhewjw.de"
    at 11:41p.m. April 29, 2010 | permalink

  • "Thanks for this, unbelievable our developer has a robots no follow tag on our site, no wonder it wasn't being found by the search engines ..."
    at 7:40a.m. March 2, 2010 | permalink

  • "maybe you are right. but how often robots.txt is actually accessed? and how much overhead there is? I'm curious - quantitatively - how big of ..."
    at 7:13p.m. Dec. 12, 2009 | permalink

  • "Lovely idea! Thanks for sharing. I'm gonna have a closer look at the patch for Django 1.2. This could help switching template engines a lot. ..."
    at 9:14a.m. Nov. 2, 2009 | permalink

  • "That was an inspiring post, I think Drupal is great! how could you hate it so much, Thanks for writing, most people don't bother."
    at 11:14a.m. Oct. 28, 2009 | permalink

  • "@Evgeniy. Yes at: http://code.google.com/p/django-alfresco/"
    at 10:42a.m. Oct. 22, 2009 | permalink

  • "Is this released as an open source project?"
    at 1:21a.m. Oct. 22, 2009 | permalink

  • "Interesting, thanks for the examples that you have shared, these are great... Anyway, thanks for the post"
    at 7:55a.m. Oct. 16, 2009 | permalink