ibexa

Path

ez publish / technical manual / 4.4 / features / clustering


Caution: This documentation is for eZ Publish legacy, from version 3.x to 5.x.

Clustering

The clustering feature makes it possible to run an eZ Publish site on several web servers. A site that is running on a cluster of servers will have better performance and will be able to handle more traffic.

Before eZ Publish clustering was implemented, the only way to support multiple servers was to store all cache files and images locally on separate file systems (one for each web server) and use "rsync" or NFS to synchronize caches & binary files. This was far from perfect, and induced many limitations. Instead, you can configure the system to store all content related caches, images and binary files in the database. This ensures that all the cluster nodes use the same cache files and have access to the same images and binary files. In other words, when content is updated, changes are automatically and instantly made available for every web servers in the cluster.

Supported database types

The clustering code is optimized for MySQL databases and requires the InnoDB storage engine. This storage engine will be used when creating the database tables needed for clustering. Contact your database administrator if you are unsure about whether InnoDB is available on your server.

Version 1.8 of the eZ Publish Extension for Oracle® Database makes it possible to use Oracle as a database for eZ Publish version 4.0 and later and also includes support for the clustering functionality. Note that the clustering functionality provided by this extension may differ slightly from the generic implementation included in a standard eZ Publish distribution.

For now, eZ Publish does not support clustering for PostgreSQL databases. Also it is important to keep in mind that the supported databases depend on the cluster file handler that is used. For instance, MySQL is always supported, whereas Oracle is only supported for eZ DB File Handler. As the eZ Publish development moves forward more database handlers will of course be made available.

How it works

Data that must be synchronized between the different servers is stored using the database. However custom templates and design items will not be stored on the database. The following overview will give an overview of which data is saved where:

Data stored using the database includes:

  •  Binary files
  •  Image and image alias files
  •  Caches related to content:
     
    •  Content view cache
    •  Template block cache
    •  Expiry cache
    •  URL alias cache
    •  RSS cache
    •  User info cache
    •  Class identifier cache
    •  Sort key cache

Other files are stored using the file system, including (but not limited to):

  •  INI files
  •  Template files
  •  Compiled templates
  •  PHP files
  •  Log files
  •  Caches that are not related to content:
     
    •  Global INI cache
    •  INI cache
    •  Codepage cache
    •  Character transformation cache
    •  Template cache
    •  Template override cache

Content view cache

When eZ Publish is displaying a page (a content node), it executes the "view" view of the "content" module and include the output in the page layout. If the output is cached, the cache file(s) will be read and served. If not, the system will fetch the content stored in the eZ Publish object database, render the necessary templates, generate a web page and store the resulting XHTML on the file system before serving it. As previously mentioned, these files can be stored in the database and thus the files (along with changes) are easily and immediately available to all servers in the cluster.

Images and image aliases

The approach described above is also used when it comes to images and image aliases (image variations). However, the solution is a bit more complicated because images are usually served directly by the web server (for instance Apache). Since the web server isn't able to communicate with the database, the images need to be served using a PHP script called "index_image.php". This is true for all content images, but not for images that are related to design.

Note that you'll need to add new rewrite rules in order to instruct Apache to use "index_image.php" when serving images. This is explained in the chapters setting it up for an eZ DB FileHandler and Setting it up for an eZ DFS FileHandler.

Notes about clearing the caches

Since eZ Publish 3.10 clearing the caches does not lead to the physical removal of cache files when using DB based handlers anymore (since this operation can be quite time consuming). The system will mark the cache files invalid instead of removing them physically from the database or file system. This can be done by either marking each particular cache file expired or setting the global expiry (the latter typically happens when a significant amount of changes is needed, e.g. when clearing all the caches of a specific type). The global expiry is a time-stamp that is used as an expiry value for all the caches in the system. If the global expiry is set to a certain date, all cache files that are older than this date will not be used. Note that the system will re-write old/expired cache file entries when re-creating the caches.

In order to physically remove the cache files from the database, the "ezcache.php" script needs to be run with the "--purge" option. The following example shows how to remove the content caches that are more than two days old:

$php bin/php/ezcache.php --clear-id=content --purge --expiry='-2 days'

Note that "$php" should be replaced by the path to your php executable.

Extra connections in MySQL

The new clustering code available since eZ Publish 3.10 performs an extra connection when writing content to the database. (This connection checks whether the file has been modified since the write lock was acquired; if it has been modified, there is thus no longer a need to write.) Because of this, the maximum number of database connections in MySQL must be increased by 30-50%. If persistent connections are enabled, the cluster code will no longer share connections with normal database calls, so the maximum number of connections previously used will have to be doubled.

Oracle-specific differences

If you use the clustering functionality provided by the eZ Publish Extension for Oracle® Database, note that the system may behave differently from what is described above. If all content related caches are stored in an Oracle database, clearing the caches will always lead to their physical removal; the "ezcache.php" script will also physically remove the cache entries from the database, even when executed without the "--purge" option.

Cluster file handlers

The cluster file handler mechanism makes it possible to store, retrieve, rename, delete, etc. files using the database. The following file handlers are known to the system by default (click on the links for more information):

  1.  eZFS (located in the "kernel/classes/clusterfilehandlers/ezfsfilehandler.php" directory of the eZ Publish installation)
  2.  eZFS2 (located in the "kernel/private/classes/clusterfilehandlers/ezfs2filehandler.php" directory of the eZ Publish installation)
  3.  eZDB (located in the "kernel/classes/clusterfilehandlers/ezdbfilehandler.php" directory of the eZ Publish installation)
  4.  eZDFS (located in the "kernel/private/classes/clusterfilehandlers/ezdfsfilehandler.php" directory of the eZ Publish installation)

Note that eZFS and eZFS2 file handlers do not allow actual eZ publish clustering by using multiple servers. Use eZDB and eZDFS for cluster file handling.

Additional HTTP header

Since eZ Publish 3.9 an additional HTTP header called "Served-by" is supported. This feature was added for the purpose of testing and debugging. It is typically useful when you need to check, from the client side, which server handled the request. The following example shows a part of a server response that contains this header:

...
Last-Modified: Fri, 29 Jun 2007 09:35:54 GMT
Served-by: 62.70.12.230
Content-Language: en-GB
...

Limitation on some file systems when storing large number of content files

eZ Publish stores all disc related content (eg Images, PDF's etc) in var/storage like the structure from content tree, creating one folder for each object. In most file systems used under Linux (especially ext2 + ext3) there exists a hard LIMIT TO 32000 directories per folder. So it is not possible to store more as 31999 objects under one folder.
To get around this limitation without changing the file system, you can split your content tree so that you don't have more than 32k content files (example: images) in the same folder.
Examples of file systems that supports more file/folder entries per folder.
- ReiserFS: roughly 1.2 million per directory
- ZFS: 2^48 (a really big number: 281474976710656)!

Svitlana Shatokhina (14/09/2010 12:35 pm)

Geir Arne Waaler (28/02/2011 3:13 pm)

Svitlana Shatokhina, Gaetano Giunta, Ester Heylen, Geir Arne Waaler


Comments

  • changed format for config parameters

    Since eZP 4.1 the format for the handler parameter has changed - look up in the original file.ini configuration file the new syntax (also for a couple of other cluster settings)