URL rewriting and search engine optimization

Most dynamic search engines have a problem with dynamic pages indexing due to complex structured URL.

For example, a dynamic address such as

page.php?cat=news&cod=5

is much more difficult to index than a static one such as

page.html

More than a strict rule, it is a moment of confusion spiders tend to experience when in a page address a character like ‘&’, ‘?’ or ‘=’ is found.

Therefore, pages with a normal address are indexed more and more often than dynamic ones. In order to get round this inconvenience, there is a technique called url rewriting, that is getting the server to rewrite dynamic addresses as static.

The way this technique is put into practice varies depending on systems technology; its accomplishment is not easy, it requires time, care and a broad experience and, when handling extensive portals, may call for a well defined project planning.

In the examples we will deal with Apache/php technology. In order to carry out the task, we need use apache module “mod_rewrite” which allows the replacement of user inputted page addresses with new ones, calculated with a Regular Expression (Regex) based syntax. Let’s pass over Regex syntax, actually rather tricky and thorny (should you want to study it in detail please refer to http://www.evolt.org/article/rating/20/22700/.) Keeping our examples as simple as possible, suppose we need making

http://www.ikaro.net/articoli/articolo.php?file_name=url_rewriting
into

http://www.ikaro.net/articoli/cnt/url_rewriting.html

Following mod_rewriting configuration should be used

RewriteEngine on RewriteRule /articoli/cnt/([^/]+).html /articoli/articolo.php?file_name=$1 [L]

First row activates rewriting module, the second one replaces the dynamic address on the right with a static address using “filename” parameter value to name the page (which actually does not exist).

The two lines can be inserted:

  • In a .htacces file in the relevant directory (the one in which the function needs being activated)
  • In the httpd.conf configuration file (in this case super-user privileges are needed)

Please note that the static address page does not exist, it is simply an alias recognized by the server so that engines see a “normal” page instead of a dynamic one and index it as usual.

This entry was posted in Problogging, Tech and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>