ibexa

Path

ez publish / technical manual / 5.x / features / clustering


Caution: This documentation is for eZ Publish legacy, from version 3.x to 5.x.

  • Clustering

    Clustering makes it possible to run eZ Publish site on multiplel web servers, in order to scale better and increase availability.

    This feature fixes the multiple and severe issues that occur when using network files synchronization (rsync) or network filesystems (NFS) with eZ Publish. The system is very heavily tied to file modification time and instant availability of file metadata across the various servers. Both methods are incomplete in that regard, and can not be relied upon for clustering.

    Supported database types

    Clustering is available for MySQL and Oracle (using the eZ Publish Extension for Oracle® Database, starting from version 1.8) databases.

    Two clustering modes are available: eZDFS and eZDB. Note that eZDB has been removed starting from eZ Publish 5.1.

    eZDB stores the files as BLOB in the database, and is the easiest to setup. It is suitable for small to medium size websites, as the database may grow to end up very very large, making maintenance more complex. is deprecated from 5.1 in favor of eZDFS.

    It is available for:

    eZDFS uses both the database and a network file system such as NFS. It stores the files metadata in the database, and relies on transaction to ensure that all operations are atomic. Actual files are stored and synchronized using NFS.
    It is available for all supported databases: 

    How it works

    Data that must be synchronized between the different servers is clusterized. Templates and design items will not be stored on the database. The following will give you an overview of which data is saved where:

    Clusterized data includes:

    •  Binary files
    •  Image and image alias files
    •  Caches related to content:
      •  Content view cache
      •  Template block cache
      •  Expiry cache
      •  URL alias cache
      •  RSS cache
      •  User info cache
      •  Class identifier cache
      •  Sort key cache

    Other files are stored using the file system, including (but not limited to):

    •  INI files
    •  Template files
    •  Compiled templates
    •  PHP files
    •  Log files
    •  Caches that are not related to content:
      •  Global INI cache
      •  INI cache 
      •  Codepage cache
      •  Character transformation cache
      •  Template cache
      •  Template override cache
      •  Translation cache

    Content view cache

    When eZ Publish is displaying a page (a content node), it executes the "view" view of the "content" module and include the output in the page layout. If the output is cached, the cache file(s) will be read and served. If not, the system will fetch the content stored in the eZ Publish object database, render the necessary templates, generate a web page and store the resulting XHTML on the file system before serving it. As previously mentioned, these files can be clusterized, and thus made easily and immediately available to all servers in the cluster.

    Images and image aliases

    The approach described above is also used when it comes to images and image aliases (image variations). However, the solution is a bit more complicated because images are usually served directly by the web server (for instance Apache). Since the web server isn't able to communicate with the cluster, the images need to be served using a PHP script called "index_cluster.php". This is true for all content images, but not for images that are related to design.

    Note that you'll need to add new rewrite rules in order to instruct Apache to use a specific index, "index_cluster.php" when serving images. This is explained in the chapters setting it up for an eZ DB FileHandler and Setting it up for an eZ DFS FileHandler.

    Notes about clearing the caches

    Since eZ Publish 3.10, clearing the caches does not lead to the physical removal of cache files when using DB and DFS based handlers anymore (since this operation can be quite time consuming). The system will mark the cache files invalid instead of removing them physically from the cluster. This can be done by either marking each particular cache file expired or setting the global expiry (the latter typically happens when a significant amount of changes is needed, e.g. when clearing all the caches of a specific type). The global expiry is a time-stamp that is used as an expiry value for all the caches in the system. If the global expiry is set to a certain date, all cache files that are older than this date will not be used. Note that the system will re-write old/expired cache file entries when re-creating the caches.

    In order to physically remove the cache files from the database, the "ezcache.php" script needs to be run with the "--purge" option. The following example shows how to remove the content caches that are more than two days old:

    $php bin/php/ezcache.php --clear-id=content --purge --expiry='-2 days'
    

    Note that "$php" should be replaced by the path to your php executable.

    Extra database connections

    The new clustering code available since eZ Publish 3.10 performs an extra connection when writing content to the database. (This connection checks whether the file has been modified since the write lock was acquired; if it has been modified, there is thus no longer a need to write.) Because of this, the maximum number of database connections in MySQL must be increased by 30-50%. If persistent connections are enabled, the cluster code will no longer share connections with normal database calls, so the maximum number of connections previously used will have to be doubled.

    Oracle-specific differences

    If you use the clustering functionality provided by the eZ Publish Extension for Oracle® Database, note that the system may behave differently from what is described above. If all content related caches are stored in an Oracle database, clearing the caches will always lead to their physical removal; the "ezcache.php" script will also physically remove the cache entries from the database, even when executed without the "--purge" option.

    Cluster file handlers

    The cluster file handler mechanism makes it possible to store, retrieve, rename, delete, etc. files using the database. The following file handlers are known to the system by default (click on the links for more information):

    1. eZFS2, deprecated as of 5.1
    2. eZDB, deprecated as of 5.1

    Note that eZFS and eZFS2 file handlers do not allow actual eZ Publish clustering by using multiple servers. Use eZDB and eZDFS for cluster file handling.

    Additional HTTP header

    Since eZ Publish 3.9 an additional HTTP header called "Served-by" is supported. This feature was added for the purpose of testing and debugging. It is typically useful when you need to check, from the client side, which server handled the request. The following example shows a part of a server response that contains this header:

    ...
    Last-Modified: Fri, 29 Jun 2007 09:35:54 GMT
    Served-by: 62.70.12.230
    Content-Language: en-GB
    ...
    

    Limitation on some file systems when storing large number of content files

    eZ Publish stores all disc related content (eg Images, PDF's etc) in var/storage like the structure from content tree, creating one folder for each object. In most file systems used under Linux (especially ext2 + ext3) there exists a hard LIMIT TO 32000 directories per folder. So it is not possible to store more as 31999 objects under one folder.

    To get around this limitation without changing the file system, you can split your content tree so that you don't have more than 32k content files (example: images) in the same folder.
    Examples of file systems that supports more file/folder entries per folder.

    • ReiserFS: roughly 1.2 million per directory
    • ZFS: 2^48 (a really big number: 281474976710656)!

    Please refer to the requirements page regarding supported setups.

    Note about database based cluster setups

    For performance reasons, we require that in production, a different database and schema is used for the cluster tables, if applicable. This ensures that transactions from the content database won't create useless contention on the cluster database. Such contention could lead to failures in storing data.

    For this reason, eZ Systems does not support such a setup, even though it will technically work fine for developement or testing purpose. 

    Svitlana Shatokhina (14/09/2010 12:35 pm)

    Ricardo Correia (15/11/2013 8:18 am)

    Geir Arne Waaler, Andrea Melo, Bertrand Dunogier, Ricardo Correia


    Comments

    There are no comments.