Tag Archives: wordpress

Solr for WordPress 0.2.0 Released

I just released Solr for WordPress 0.2.0. This release completely replaces the default WordPress search without any special setup. I have also added i18n support so people can translate it into different languages, integrated it into the default WordPress theme, and added support to enable or disable specific facets. This release should make it much easier for people to get setup and working correctly. As usual, please let me know of any bugs you might find by opening a report at https://bugs.launchpad.net/solr4wordpress.

Download here.

Solr for WordPress

Solr for WordPress
Solr for WordPress is a WordPress plugin that interacts with an instance of the Solr search engine. With this plugin you can:

  • Index pages and posts
  • Perform advanced queries
  • Enable faceting on fields such as tags, categories, and author
  • Treat the category facet as a taxonomy
  • Add special template tags so you can create your own custom result pages to match your theme
  • Configuration options allow you to select pages to ignore, features to enable/disable, and what type of result information you want output.
  • Hit highlighting
  • Dynamic result teasers

Solr for WordPress requires WordPress 2.7 or greater and an instance of Solr 1.3 or greater. Installation is simple, just extract the plugin in your WordPress plugins folder, activate it, then point it at your Solr instance via the configuration page. From there, you can index all your pages and/or posts and you are ready to perform searches against your WordPress data.

This plugin assumes your Solr schema contains the following fields: id, permalink, title, content, numcomments, categories, categoriessrch, tags, tagssrch, author, type, and text. The facet fields (categories, tags, author, and type) should be string fields. You can make tagssrch and categoriessrch any type you want as they are used for general searching. The plugin is distributed with a Solr schema you can use. I will eventually package up a version of Solr configured specifically for this plugin. Until then, the provided schema will have to do.

Integrating Solr for WordPress into your theme is quite simple as well. The plugin provides two template tags, one for a search box and another for search results. For the search box, use the s4w_search_form() tag. For the search results use the s4w_search_results() tag. These template tags output valid xhtml that you can style with css.

This version of the plugin requires you to create your own search page template then create a search page called “Search” using this template. It also requires you to manually update any search forms to search against the search page you just create (“/search/”) and putting the query parameters in the “qry” parameter. In future versions it will completely replace the standard WordPress search functionality.

By default, facting is enabled for the category, tags, author, and post type. Faceting allows your user to drill down into the search results filtering on values of the particular facet. The category facet can be treated as a taxonomy as well.

UPDATE:
Released Solr for WordPress 0.2.0

Plugin Home: Solr for WordPress
Download: Solr for WordPress 0.1.0
WordPress Hosted Plugin Page: Solr for WordPress

New Design

I have finally decided to update my site. I have been pretty busy with work the last couple years and have been slacking on the site. Well, I got an itch to start working on it again and decided to kick off my renewed interest with a completely new design. The design is a heavily modified version of the Elixir theme by Michael Whalen.

Along with the new design I have integrated Solr search. Solr is an amazing search engine. I wrote a plugin called Solr for WordPress that handles all the integration between Solr and WordPress. I will be writing a post about the plugin soon.

Comment Spam

Within a week of switching to WordPress for my blogging software, I started receiving a lot of comment spam. I found this amazing because I have had a blog for a few years now without any problems. I have had the occasional spam comment, but lately I have been receiving 3-7 of them a day. I know this is very little compared to high-volume sites, but seems like a lot for a small site like mine. For the most part, the Akismet spam plugin WordPress ships with does an amazing job. It has let a few slip by, but that is no big deal.

This whole comment spam problem reminded me of a research paper I read a year or so ago. It was called Defending Against an Internet-based Attack on the Physical World. It was about the threat of using api’s such as Google’s SOAP API to automate filling out request forms for catalogues and other material on thousands of sites to some victim. This would cause the victim’s physical mail to become overloaded and very hard to manage. Imagine 100′s or 1,000′s of pieces of mail being delivered to your house every day. The point of this being that I figure spammers are using a technique similar to this to find WordPress blogs, then spam them automatically.

I decided to see how easy it was. First I went to see if I could sign up for Google’s SOAP API, but I found out that they no longer offer this service. Without this service, it is going to be a lot harder to get this done. Ignoring the whole api problem, I decided to find a search string to find comment pages on WordPress blogs. I was amazed at how easy this was. I just went to a blog using the default WordPress theme and looked for keywords that would always be there. After about a second I came up with this search string:

"Leave a Reply" Name Mail Website "proudly powered by WordPress"

Typing this into google found over 1,000,000 pages! Clicking a few of these verified that they were infact WordPress comment pages. Now I needed to write a program to automate parsing these links. Without the search api, I was stuck doing it manually. After about an hour I came up with this python script. This script will submit the search string I generated above to google, parse the first 100 results from the page, then submit a search for the next 100 and so on. While testing this script I noticed google started blocking my search, which is a good thing. I found a way around this by using different User-Agent strings and adding some timeouts. Because of this, the script defaults to saving the first 100 links. I have left out the code to fill out the comment forms becuase I feel that piece of code would do more harm than good.

Anyways, I think there is a huge problem with comment spam that needs to be fixed. The fact that so many pages can be found in a single search is amazing. Google blocking querys when it detects a bot is definitely a step in the right direction. The fact that I was able to get around this so easily is not.

Files:
http://www.mattweber.org/files/wp-link-finder.py

Old Content

So it looks like I will be unable to get the contents of my old site back, so I will most likely get what I can from Google’s cache and forget about the rest. Sorry.

I have started reading up on css, xhtml, and ajax so I can throw together a new design. I think the default WordPress theme is good enough until I can get a new one ready. Any suggestions for the design would be great.

PyBlosxom Plugin: googlestats.py

This plugin keeps statistics on googlebot visits to you blog entry’s. It was inspired by the WP-GoogleStats plugin for wordpress. When enabled this plugin checks if the visitor is the googlebot, and if it is, updates the number of visits, last visit date, and last visit time template variables. You can use these template variables in the body of your posts to create custom messages based on visit statistics, or use the built-in $googlestats template variable to display a default message.

Download:

http://www.mattweber.org/files/googlestats.py