A couple weeks ago David Ziegler posted an article on how to extract excerpts from articles using Python and BeautifulSoup. It works well, but I would like to suggest some improvements by using lxml instead. It's a fairly simple problem. Get the title and the description out of the head, and if there is no description, try to pull some content out of the body. First two easy and the last one sucks, but Python has tools that make our life easier. BeautifulSoup is the go to for web scraping in Python, but it suffers when it comes to performance. lxml is definitely faster and in this case about 3 times so.
I have a wheelhouse and it's integrating Django with a Java Open Source project. Today I get to announce the next one, Django Alfresco. We combined the Alfresco's document management capabilities with Django's web tier components. I get mixed reactions when I tell people about this project. Anywhere from, "Why did you go and mess up a good thing" to "This is amazing." The former more than the latter, but I'm going to try to convince you that it is a really good idea to use this project. Jeff Potts who is the ECM lead at Optaros and got the project to a place where it could be released has a post on it and a screencast.
More >
Recently I started to move Cubby Scott away from a cron and towards a queue. It's hard to be real time when you wake up a cron job once every 3 minutes. Lame. I'm also in the process of adding screenshots and content retrieval. Both take a good amount of time to process. The queue part was easy after reading Rabbits and Warrens and Working with Python and RabbitMQ. The problem came when I started working on the consumer, no one ever talks about the consumer. Well I'm going to give the consumer some love.
More >
Is there a 3rd party twitter app that builds a link page based on my follows? If not, someone should build it. It would be my start page.
Fred Wilson posted this tweet a few days ago, a pretty simple requirement. Get all users that Fred is following, parse, get the links and display them for Fred's viewing pleasure. Personally I really like this idea. The problem with an asymmetrical relationship is that you really only follow that person for the interesting links they post. I follow mostly tech people and honestly, their personal comments don't really do much for me. It would be great if I could get all those links into one feed and filter out all the noise.
So in the last 4 days I put together an application to do this. Personal web developer to Fred Wilson and hopefully a few others out there.
More >
One of the major pain points of using reusable apps is modifying the view logic. No matter how many options you can pass into the view function, someone is going to want to do something different with it. I've been using Pinax for a little while now and modifying views is the only thing I don't love about it. Enter class based views. There is currently a ticket out there to make them part of Django 1.2 and there is a great example out on djangosnippets.org. Instead of trying to deal with everything in the url conf you get a class object with functions that you can override. The simple user will not know the difference between the two, while the advanced user can create custom views easily, without repeating logic. Who likes DRY?
More >
This is a "This is how I solved a problem, I know there is a better way, so someone please tell me" post. Denormalization is something that has gotten a bit of press in the Django community. David Cramer has a great post on a model to handle this and there was some discussion on adding a DenormalisationField to have this to Django core. I ended up needing to use a denormalized field on a project, but still wanted to use managers to handle the related field.
More >
I went through the process of installing Review Board last night and was happily surprised and deeply angered at the same time. I'm pleased to report that most my anger was taken out on CentOS. Review Board is a Django powered tool for code reviews developed by VMware and subsequently Open Sourced. If you ever have some free time, browse around their repository, it's pretty impressive. I'll talk about my experience of installing it, but mostly I'll stick to the processes behind the code reviews.
More >
After writing about a thousand lines of documentation, I'm spent. I'd like to write something witty or clever or fun, but I got nothing.
With some Optaros blessing I released a Django Solr module out into the wild. It was written by Jay Dolan for a top 20 newspaper site about 9 months ago, but had to be rewritten for Django 1.0 and I got side tracked with clients and Alfresco.
More >
In the spirit of SEO, Django and this site I took the next step and added a robots.txt file. Nothing too exciting here, but I saw some postings on how to do it the wrong way that made me sad. Like serving the robots.txt file via Django. Why are you adding the overhead of django/mod_python to serve a static file?
More >
In an effort to play around with the sitemaps contrib package I decided to add a sitemap to this website. It's pretty intuitive and easy, so there is really no excuse not to put a sitemap on your site. It's about 15 minutes worth of effort and 2 changes. The effort comes when you don't want to ignore every search engine other than Google. Let's start by adding a sitemap then we'll show how to ping Yahoo and Ask as well.
More >
Enterprises like Java. It's safe. No one has ever been fired for suggesting Java. More and more I think the industry as a whole is realizing that Java doesn't work for rapid development. (Groovy on Grails excluded)
So how do you convince a (insert large Java shop here) to take a flyer on one of them there newfangled web frameworks? Well tell them we can run it in a JVM of course.
More >
First off a little comparison.
Tag lines-
Drupal: "Drupal is a free software package that allows an individual or a community of users to easily publish, manage and organize a wide variety of content on a website."
Django: "Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design."
Clean and Rapid > Easy: Django +1
More >I'm a developer out of Boston MA and I work for a consulting firm specializing in open source technologies.
This space will deal with the work I've participated in using the Django framework to build applications for enterprise clients.
Finally, I hate the word blog and Drupal.
"А интересно, сам автор читает комментарии к этому сообщению. Или мы тут сами для себя пишем? :)"
at 4:58a.m. March 9, 2010 | permalink
"Прошу прощения за оффтопик. Вы продаете сквозные ссылки с сайта? Если да, свяжитесь со мной, плз!"
at 8:06p.m. March 8, 2010 | permalink
"Об этом уже писал кто-то из моих ЖЖ-френдов :("
at 10:29a.m. March 8, 2010 | permalink
"У Вас долго загружается блог - видимо, хостинг плоховат"
at 9:41p.m. March 6, 2010 | permalink
"I just discovered <a href=http://bit.ly/bMGrYw>SatelliteTV</a> on my PC! Ultra cheap at only $50 once off to get the software and an account on the Internet. ..."
at 5:20p.m. March 4, 2010 | permalink
"Логотип мне нравится:)"
at 8:47a.m. March 4, 2010 | permalink
"Девушки из твоих грёз на твоём рабочем столе. 1.Полностью бесплатно 2.100% безопасность вашего ПК 3.Новые девушки каждый день <a href=http://blogs.mail.ru/mail/erorulez/6605707A18ACC7D6.html>смотреть стриптиз бесплатно</a> http://blogs.mail.ru/mail/erorulez/6605707A18ACC7D6.html эгоистка стриптиз ..."
at 5:08a.m. March 4, 2010 | permalink
"uh.. strange .."
at 11:54p.m. March 3, 2010 | permalink
"Hi guys, I know this might be a bit off topic but seeing that a bunch of you own websites, where would the best place ..."
at 11:12p.m. March 3, 2010 | permalink
"Thanks for this, unbelievable our developer has a robots no follow tag on our site, no wonder it wasn't being found by the search engines ..."
at 7:40a.m. March 2, 2010 | permalink
"В Вашей RSS нельзя получать полные тексты записей, что ли?"
at 9:37p.m. March 1, 2010 | permalink
"Hello, We are representing <a href="http://www.keepingmyhair.com/hair-cloning-a-resume">Hair Loss news</a>. We manage plenty of web sites, and we found your website trought the net. We are asking ..."
at 12:58a.m. Feb. 28, 2010 | permalink