Let's talk Apache Solr Clean URL's

In this third and fast paced blog series I'll be talking about Apache Solr Clear Url's.

Some history

When we roll back a year or two, I was working for a company where I was working on a search-critical solr + Drupal website and Drupal was praised for its ability to have dynamic clean urls. We needed search pages with clean url's but also the facets needed to be clean. I remember that one of the members of the team had put in a lot of effort to make this work. And believe me, he really almost died trying! This required a lot of code and basically a lot of menu_hook_alters to take over the existing functionality.

So, let's fast forward 2 years now. In Drupal 7 (and also Drupal 6, because the whole module has been backported) there is an ability to use search pages, with dynamic urls and these search pages can have facets if you use the Facet API module. Facet Api is basically a module that takes care of the whole display and url logic when you provide it with some facet data. Thanks to Acquia, I was able to spent a lot of time in understanding the whole structure and get up to speed with all the search related modules that were in development. So thank you for that opportunity!

Goal

What we want is to convert

http://nickveenhof.be/search/site/screencast?f[0]=im_taxonomy_vocabulary_1%3A101

to

http://nickveenhof.be/search/site/screencast/im_taxonomy_vocabulary_1/101

Roadbumps

There were a few hurdles to jump over before clean url facetting could be reality we needed to have a better understanding of how the search menu implementations work. Before we can understand those we should analyze the basics. Let's scope to the core search module. I've modified the code snippet a bit for better readability.

  1. $items["search/node/%menu_tail"] = array(
  2. 'title' => $search_info['title'],
  3. 'load arguments' => array('%map', '%index'),
  4. 'page callback' => 'search_view',
  5. 'page arguments' => array($module, 2),
  6. ...
  7. );

You can see that a menu_tail is used and this "Loads path as one string from the argument we are currently at".

This may sound a bit incomprehensible but in essence it means that everything after the 'search/node' path is seen as the argument/query. Let's take a look at some examples.

  • fill in "test and test2" in the search box
  • value is url encoded using drupal_encode_path
  • url becomes search/node/test%20and%20test2"

Another scenario

  • fill in "test/test/test/test"
  • value is url encoded using drupal_encode_path
  • url becomes search/node/test/test/test/test

Wait, what? Why isn't the slash encoded? Try to understand the drupal_encode_path function, but because of aesthetic reasons slashes are not escaped. We are faced with a problem. We don't know what our search query exactly is when we start adding custom paths (such as facet paths) after or before the search query.

  1. function drupal_encode_path($path) {
  2. // For aesthetic reasons slashes are not escaped.
  3. return str_replace('%2F', '/', rawurlencode($path));
  4. }

Facet Api Clean URL's

Let's jump back to Apache Solr Search Integration, this module utilises Facet Api for all its facetting needs. If we want to utilise clean urls for search facets, we needed to fix this %menu_tail problem.

So, I sat down during the Drupal Dev Days a couple weeks ago and spoke to dasjo. He is the original author of Facet Api Pretty Paths and it was already working in combination with Search API and that was because Search API did not follow the core search paths, nor even depend on core Search.

I've twisted and spun my head around this difficult problem and, after trying too many hacky regular expressions, a tip from sun helped us finding a solution for fetching the search query from the url, avoiding drupal_encode_path and allowing the facets to work on search pages. Because of the difficulties discussed above the main Apache Solr module had to be patched and some adjustments had to be made to the Facet Api Pretty Paths module but we now have a working combo.

The module is generic enough not to contain any Search API or Apache Solr Search Integration specific logic. It works with any Facet Api implementation. A live demo can be found at absolventen.at (Search API) or you can try it out yourselves. For sake of a demo, I enabled it on my blog (Apache Solr) so you can test it out here as well. I warn you though, it is an alpha! Url's might still look messy Clean Url's for Apache Solr and Facet Api

Future work

The module Facet Api Pretty Paths is still in alpha stage and needs your help to decipher the facet api urls in to human readable snippets. The trickiest part is for example the date range, where a number of possibilities are valid (Want to see which ones?). For example, we need to get the following one clean dynamically.

http://nickveenhof.be/search/site/drupalcon?f[0]=ds_created%3A%5B2008-01-01T00%3A00%3A00Z%20TO%202009-01-01T00%3A00%3A00Z%5D

But the result is not very pretty yet...

http://nickveenhof.be/search/site/drupalcon/ds_created/%5B2008-01-01T00%3A00%3A00Z%20TO%202009-01-01T00%3A00%3A00Z%5D

Also the Facet Api Slider is not fully compatible with Facet Api Pretty Paths, so help is requested here. And it might be nice to use tokens to replace urls and whatnot. Not enough time, argh! :-)

We do need your help. There are some beta blockers pending and even if you just report an issue, it would help. Please join the issue queue!