URL transformation rules
When a site administrator enters a value for a new virtual URL, the system will perform cleanup of the input by using so-called URL transformation rules. This is done in order to avoid problems with certain characters and to ensure that the alias conforms the standards and the other URLs of the site. If an inputted alias is modified, the user will be notified.
Note that in eZ Publish 3.10, the transformation of entered/generated aliases has changed.
Unicode support
In versions prior to 3.10, URL transformation rules were more restrictive and only supported some ASCII characters (lowercase Latin letters from "a" to "z", digits and underscores). This caused problems for many non-western languages that use different alphabets, some of them which are difficult to transliterate.
From eZ Publish 3.10, it is possible to enable Unicode support for the URLs and thus no transliteration needs to be performed since most characters are allowed. The following characters are not allowed: ampersand, semi-colon, forward slash, colon, equal sign, question mark, square brackets, parenthesis and the plus sign. Note that spaces are only allowed as word separators. These characters are not allowed in order to avoid miscellaneous problems (related to the HTTP protocol).
The Unicode characters are encoded using the IRI standard. The text is encoded using UTF-8 before further encoding is performed. The resulting URL will contain characters that are compatible with the HTTP protocol and which will work in all existing browsers/clients. Note that modern browsers will decode the URL and display the characters using Unicode.
Dash/underscore/space
In versions prior to 3.10, only underscores were allowed as separators of words. From 3.10, it is possible to choose which word separator that should be used. This can be done by changing the value of the "WordSeparator" configuration directive located in the [URLTranslator] section of an override for "site.ini". It can be set to either "dash", "underscore" or "space". Note that this setting will be ignored when the "urlalias_compat" transformation method is used (since it only supports underscores as separators).
Case sensitivity
When the "urlalias" or "urlalias_iri" transformation method is used, the URLs will consist of mixed cases (uppercase and lowercase characters). This is different from the traditional/old behavior where every letter was converted to lowercase. Instead, the system will preserve the cases and store the URL aliases accordingly. However, the URLs themselves will not be case sensitive. For example, the URL alias for a node called "About Us" will be "About-Us" (assuming that the word separator is a dash). The "About Us" node will be accessible regardless of how the URL is specified when it comes to lowercase and uppercase letters. In other words, the node will be accessible through all of the following URLs: "www.example.com/about-us", "www.example.com/About-us", "www.example.com/ABOUT-US"; and so on.
Note that if there are two nodes with (almost) identical names within the same location (for example "My article" and "My Article" inside a folder called "News"), the system will generate unique URL aliases for newly introduced conflicting nodes by attaching numbers to their URL aliases. For example, if a node called "My article" already exits and "My Article" is created at the same location, the URL alias of the second ("My Article") node will be "My-Article2". If a third "MY Article" node is introduced, it's URL alias will be "MY-Article3"; and so on.
Alias text filtering
Support for filtering was implemented in order to introduce more flexibility when it comes to the generation of the aliases. The filters are performed by the system on the URLs before the result is transformed to a valid alias. The filters can be created as extensions. The following text explains how to create a new filter.
Open the "site.ini" override and add a new extension (f.ex. "myfilters") under the [URLTranslator] section.
Extensions[] Extensions[]=myfilters
Add a new filter in the "Filters[]" array (f.ex. "StripWords") under the [URLTranslator] section..
Extensions[] Extensions[]=myfilters Filters[] Filters[]=StripWords
The system will search for the "stripwords.php" file containing the "StripWords" filter class.
Create a file called "stripwords.php" located in the "extension/myfilters/urlfilters" directory. Note that all filters must be placed inside the "urlfilters" directory located within an extension's directory. Make sure that the newly created file contains the following lines:
<?php class StripWords { function process( $text, $languageObject, $caller ) { return str_replace( "hell", "", $text ); } } ?>
The filter class "StripWords" implements a method called "process" which has three parameters: the text to filter, the language object (eZContentLanguage) and the object which called the filter process. The method returns a filtered version of the text. In this example, all occurrences of the word "hell" are removed (replaced with nothing). In other words, after this filter is introduced, newly created URLs will not contain the word "hell".
Refer to the "[URLTranslator]" section of the "site.ini" for more information about the "Filters" setting.
Julia Shymova (14/09/2010 12:21 pm)
Geir Arne Waaler (28/09/2010 6:44 pm)
Comments
Filters[] is replaced in 4.3
Tuesday 20 July 2010 11:33:36 am
STEVO
hope this saves someone else an hour or 2.