Understanding Day's CQ & Underlying frameworks - Part 1

Recently I have got an opportunity to work on a great CMS tool from www.day.com (CQ). CQ is an abstraction on top of all great JAVA frameworks/tools (JCR, Sling, OSGI and DAY’s own component based framework) and fits well for almost all enterprise application. Initially when I started working on it I thought it is a propriety tool and have very limited scope to show your innovations and doing experiments but, after taking a deep dive of underling technology/frameworks I realized that it is a great combination of various great frameworks. CQ is based on following technologies/frameworks (completely JAVA centric):

1)       Sling (http://sling.apache.org/site/index.html): A REST based web framework for accessing resources (JCR – Java Content repository)
2)       Felix (http://felix.apache.org/site/index.html  - An OSGI specification implementation): A lightweight container that is very different from JVM for handling class loading and provides a class level SOA platform.
3)       CRX/Jackrabbit (http://jackrabbit.apache.org - A JCR specification implementation): A specification which tells how we can manage our data (that includes images, text files, string, long to everything else…) as structured nodes.

For those who are not well versed with CQ’s underlying frameworks I’ll try to cover it in other posts that I’ll be posting in coming days. In this post my main focus is to explain CQ architecture and best practices (just an overview). I’ll also cover the best practices for various design and development concepts (creating templates, pages, components, JCR repository manager, writing custom JCR nodes, JCR queries and authenticators) in individual posts (later).

Ok, so the CQ is not a new framework and you don’t need to learn new programming language. If you are a developer from Java/JSP background with decent experience of JavaScript, AJAX, XML, JSON and CSS you can do magic with CQ. CQ follows a template, page and component based development methodology.

·         Template (cq:Template): Every page that we build for our website should extend from some template. Template itself does not have any rendering logic, a template is just a logical unit in CQ environment which groups certain pages that shares common features (features can be functional or non functional). For example, we have a group of pages that users can access without logging in (these are static/public pages), these pages have common feature (i.e. they are public, it is functional feature) and share common headers and footers (this is non-functional/rendering feature). As I mentioned above that template itself does not have any rendering logic then a general question that you might ask “how the pages are getting rendered?”, well we need to define a resource/page (cq:Page) that can will render the template.

·         Page (cq:Page): To create a page on our web site we need a template and to render a template we need a page. A page is combination of one or more resources (Java classes, JSP etc.), and the primary goal of a page to create page structure (ex. Two column with a header or one column with header and footer) in which components can be placed. So a page renders blank container and we need to place components in it, this is real power of CQ. We can add and remove components on a page, we can change their position of components and even we can extend a page and add/remove components from extended pages.

·         Component (cq:Component): Component is a reusable entity that we can place on any number of pages. As pages can be extended to add/remove functionality similarly a component can also be extended to add/remove functionality. Components are the smallest building block of a page and usually a component is composed of various resources (Java classes, JSPs, JS).

Let’s see how Sling, JCR and Felix contribute in CQ framework and what role they are playing as a building block.

1)       Sling - Request Resolution to a Resource/Script/Servlet (JCR Node/Script): We a request comes to CQ the first thing that happens is request wrapping and resource/page/script resolution. This is where sling comes in to picture, sling looks for the incoming request (HttpServletRequest) and adds a wrapper on it SlingHttpServletRequest. The SlingHttpServletRequest wrapper provides some additional information to sling framework for resolving a particular Resource/Servlet/Scrip on server (in JCR repository). Once the request is wrapped as a SlingHttpServletRequest, sling parses the incoming request URL and  breaks it down in to following pieces with the help of additional information that we have in SlingHttpServletRequest wrapper:

NOTE: Scripts and servlets are resource in Sling and thus have a resource path, this is the location in the JCR repository (sling:resourceType). Scripts and Servlets can be extended using the sling:superResourceType property (I’ll cover this in another post “Component and Page inheritance”).

a)       Servlet/Script (sling:resourceType): incoming request is parsed and a servlet/script/resource name is extracted from it. A script can be a JSP file, Java class or ActionScript (Flex/Flash) file., the type of script that will be executed depends on the extension and selectors (see below). Internally sling calls [request.getResource().getResourceType()] to get sling:resourceType. Type of supported script is configurable, to see which scripts are supported in your environment navigate to http://localhost:4502/system/console/scriptengines
b)       Selector: based on the URL sling decides which type of script to execute, internally sling makes a call [request.getRequestPathInfo().getSelectorString()] to extract selector(s). Let’s say we have a requirement where we want send response in three different formats (XML, JSON, TXT) for same URL, this can be achieved with the help of selectors.
c)       Extension: incoming request is parsed and an extension is extracted out of it for script file, internally sling makes a call [request.getRequestPathInfo().getExtension()]. It is possible to have a multiple script files with different extensions and based on the selector(s) provided in incoming URL appropriate script will be executed.
d)       Request Method: Request method is required when the request is not GET or HEAD.

Let’s try to tie all 4 pieces together, The resourceType is used as a (relative) parent path to the Servlet/Script in JCR repository while the Extension or Request Method is used as the Servlet/Script(base) name. The Servlet is retrieved from the Resource tree (Repository) by calling the [ResourceResolver.getResource(String)] method which handles absolute and relative paths correctly by searching relative paths in the configured search path [ResourceResolver.getSearchPath()] and sling:resourceType (and sling:resourceSuperType) of the requested resource. To see and configure the path where sling performs looks for resources, navigate to (JCR resource revolver tab on Felix console) http://localhost:4502/system/console/jcrresolver, if required we can map additional paths with various regular expression.

Here is an example URL and its decomposition, let’s say the URL (http://suryakand-shinde.blogspot.com/reports/june/expense.format.pdf.html is used to get the expense reports in PDF format for the month of June (it is stored in JCR repository under /reports/june/expense/) :

·         Server: suryakand-shinde.blogspot.com
·         Script/Servlet (resourceTypeLabel): /reports/june/expense (The last path segment of the path created from the resource type)
·         Selector: format/pdf (we can have a JSON and TXT selectors if we want to get the same report in various formats)
·         Extension (requestExtension): html

If we have multiple selectors and extensions in request URL then the following rule is applied to resolve a resource:

·         Numbers of selectors in request URL are given first preference.
·         Requests with extension are given more preference over request without extension.
·         A script found earlier matches better than a script found later in the processing order. This means, that script closer to the original resource type in the resource type hierarchy is considered earlier.

For more information on servlet/script resolution please see: http://sling.apache.org/site/servlet-resolution.html

NOTE: Sling treats request methods (GET, PUT, POST, HEAD) differently. So, it’s really important to understand and choose the right request method while designing applications. Only for GET and HEAD requests will the request selectors and extension be considered for script selection. For other requests the servlet or script name (without the script extension) must exactly match the request method. Here is quick example of how sling extracts Servlet/Script,
2)       JCR – The data/resource storage: In any application we need a data base to store data (user information, text data, images etc.) so in case of CQ JCR (CRX) is plays role of a database. Data in JCR (Java Content Repository) is structured as nodes; a node can be a folder, file or a representation of any real time entity. Let’s try to co-relate a traditional database (like MySQL) with JCR. In traditional database we store information/data in tables, each table has multiple columns (few of them are mandatory, few of them have data constraints and few of them are optional) and each table has multiple rows. In case of JCR we store data in JCR node of a particular type (so treat this as our table), each node type have multiple properties (so treat this as table columns) few node properties are mandatory, few node properties have constrains (like the property value should be a string, long etc.) and few node properties are optional. We can have multiple nodes (so treat this as out table rows) of a particular type in our JCR repository. To fetch the required data from database tables we write SQL queries similarly, JCR also supports SQL (Query.JCR_SQL2) for querying nodes in JCR repository. JCR also supports the XPath queries (Query.XPATH) to find/query nodes based on path.

Let’s say we have multiple portals and we want to store portal configurations (e.g. a unique id for portal, portal name, home page URL etc.) in a database tables so, we’ll create a table called as Portal with Columns (portal_id, portal_name, portal_home_page etc.) to store portal configurations, each portal will have a row in database with its own configurations. How to do this in JCR?? In JCR we’ll define a node type config:Portal (that will be registered against a namespace so that it is not conflicting with other nodes that have same name) and node Properties (portalId, portalName, portalHomePage etc.) and each portal will have a separate node in JCR with its own configurations. Here is a diagrammatic mapping to traditional database and JCR:

Figure: Traditional Database V/S JCR Node comparison

What extra we are getting from JCR?

·         Traditional database supports SQL but JCR supports SQL (the format of queries is little different) and XPATH.
·         Structure of database tables are predefined and we can not add or remove certain columns for an individual row (all rows have same columns), in JCR with the help of nt:unstructured and mixin nodes we can add and remove properties of individual nodes.
·         In traditional database files/images and large text are represented as BLOB/CLOB with some limitations but, in JCR they are stored as node types and search and retrieval is easy.
·         JCR has its own access control mechanism (ACL) and user management framework.
·         XML Import & Export
·         Provides fast text search (using the Lucene).
·         Locking, versioning and Notifications.

3)        Felix – managing class dependencies and services: Felix is and OSGI specification implementation that is embedded in CQ for managing service components and their dependencies. Main benefit of using OSGI as an underlying technology for managing service/component dependencies is, it allows us to start/stop services (components) and host multiple version of same service. A service or a component can be configured via the Felix web console and Configuration Admin. Let’s take a smile example, I have an application that is interacting with underlying MySQL database and after few month I found that MySQL team has fixed a major bug in their new version of mysql-connector library release so in order to incorporate this new library in my traditional application I have to stop my application and re-package it (or just replace the older one) but, with OSGI we don’t need to stop the whole application because everything is exposed either as a component or as a service therefore we just need to install new component/service in OSGI container. As and when the services/components are updated in OSGI container there are various event listeners that propagate the service/component update event to service/component consumers and accordingly consumers adapts themselves to use new version of web service (on the consumer side we need to listen for various events so that consumers can decide whether to respond for change or not?).

No framework provides everything that we need built-in, we need to understand the platform/framework that we have chosen for development, and we need to think about how we can utilize it in better way. So, to use the CQ in its full capacity it’s really important to understand the concept and idea behind having Templates, Pages, Components, JCR data modeling and how services/components can be utilized and designed. Each underlying technology (Sling, JCR and OSGI) itself is very vast and I am just a new learner of it, please feel free to comment and share your ideas.

Resources that you can refer for further reading:
-- Ideas can change everything


Munny said…
Nice n clear description about Day architecture
Anonymous said…
"Here is quick example of how sling extracts Servlet/Script," at the end of paragraph 1. But there is no example below :(
Denis Lutz said…

Great post, thank you so much. I got a better understanding of things immidiatelly. Can you recommend any other blogs like yours about CQ?

Akash said…
Can you please tell me how to assign access Privileges to users from code in DAY CQ.Please post a blog on workflow and users access Privileges. How ther are manages in day.
Anonymous said…
Awesome summary of underlying technologies of CQ! Sometimes, even a whole book or a series of technical articles do not communicate the flavor of the topic, like this did. Good work.
Anonymous said…

My question:

I have a selector called json.js.jsp and my page is displayed correctly in the json format (http://../a.json.js)

I created a new selector xsl.jsp and the page is rendered as http://../a.xsl.html

It should be http://../a.xsl

any idea??

m.s said…
Hi buddy need your help in cq
my images are not getting cached in dam foler under document root .i have already activated the names mangling but it still doesnt create folder jcr:content to _jcr_content/ . what might be the problem . i am using cq5.6 could you help ?
Suryakand said…
I need more information in order to provide more accurate solution for your problem.

Here are few things that you may want to check:
1) check the /var/dam folder and make sure that it does not contain any junk.
2) Try deleting older files from both locations /var/dam and /content/dam and re-uplaod them (using the DAM uploader or digital asset management console in CQ 5.6)

If this does not works for you then please provide more information.

-- Surya
CMreddy said…
clearly explained...thanks surya

i recently moved to cq5 project.Now i am learning cq5. Please update more data on this technology. it's really helping me to understand the things in better way.please explain with examples..

Thanks to surya once again

Popular posts from this blog

Sling Authentication

CQ Development - OSGi bundles and Components

Create an AEM (CQ) project using Maven

Multiple log files using log4j appender