Saturday, September 15, 2012

Hazelcast Overview

This is just an evaluation summary. A shortened version of Hazelcast documentation. I pointed out the most important facts about it.

Hazelcast allows for distributing application data across cluster nodes. It is a peer-to-peer solution to avoid single point of failure.

  • Data in the cluster is almost evenly distributed (partitioned) across all nodes. So each node carries ~ (1/n * total-data) + backups , n being the number of nodes in the cluster.
    • Although it may be setup as Near Cache which disables even distribution
    • How can this work ? Hazelcast allows you to run distributed queries on distributed map.
    • Instead of getting entire entry set and iterating it locally, look for certain entries you are interested in
    • map.values(new SqlPredicate("active AND age < 30"));
      • OR
    • map.values(e.is("active").and(e.get("age").lessThan(30)));
    • queries run on each member in parallel and only the caller returns the results
  • Distributed Events
    • Hazelcast allows you to register for entry events to get notified when entries are added, updated or removed

    • MembershipListener for cluster membership events
    • InstanceListener for distributed instance creation and destroy events
    • MigrationListener for partition migration start and complete events
    • LifecycleListener for HazelcastInstance lifecycle events
    • EntryListener for IMap and MultiMap entry events
    • ItemListener for IQueue, ISet and IList item events
    • MessageListener for ITopic message events

  • Hazelcast allows you to load and store the distributed map entries from/to a persistent datastore such as relational database. Just supply it with a class that implements MapLoader and MapStore.
    • Existing implementations for JPA spec and MongoDB

  • Integration with Spring
    • overall configuration
    • registering distributed collections and maps and have them injected into beans
    • caching via com.hazelcast.spring.cache.HazelcastCacheManager
    • spring-data : JPA & MongoDB

  • Transaction context :  txn.begin(); txn.commit() / txn.rollback();
  • Hazelcast WM allows you to cluster user http sessions automatically
  • Distributed Execution - running tasks in parallel within cluster
  • Clients
    • enables you to do all Hazelcast operations without being a member of the cluster. It connects to one of the cluster members and delegates all cluster wide operations to it
    • Java Client
    • Rest Client

  • All distributed objects such as your key and value objects have to be Serializable !
  • Hazelcast has a very nice web based Management Center

Guava Live View of collections explained

The collection returned when using filter or transform methods is so called Live View of the underlying collection. Its iterator doesn't support removing elements, only adding. The most useful thing about Live Views is that they let you work with elements that satisfy the Predicate condition only. It is handy especially when the elements are Objects and their state changes so that the object may or may not pass the condition and be part of the unfiltered View.

In the following test case you can see them in action :

  

    @Test
    public void testLiveViews() {

        /** adding 9 arrays with 0,1,2,3,4,5,6,7,8 */
        List<Integer[]> list = new ArrayList<Integer[]>(9);
        for (int i = 0; i < 9; i++) {
            list.add(new Integer[]{i});
        }

        /** filtering arrays to even values 0,2,4,6,8 */
        Collection<Integer[]> filteredList = Collections2.filter(list, new Predicate<Integer[]>() {
            public boolean apply(Integer[] input) {
                return input[0] % 2 == 0;
            }
        });
        Assert.assertEquals(9, list.size());
        Assert.assertEquals(5, filteredList.size());

        /** filtered live view reflects changes into the underlying collection */
        Assert.assertEquals((int) list.get(2)[0], 2);
        for (Integer[] s : filteredList) {
            s[0] = s[0] + 1;
        }
        Assert.assertEquals((int) list.get(2)[0], 3);

        /** elements changed so that they don't satisfy filtering condition hence size is 0 */
        Assert.assertEquals(0, filteredList.size());
        Assert.assertEquals(9, list.size());

        /** adding even value that satisfy condition to filtered list */
        filteredList.add(new Integer[]{666});
        Assert.assertEquals(1, filteredList.size());
        Assert.assertEquals(10, list.size());

        /** adding even value that satisfy condition to underlying list */
        list.add(new Integer[]{888});
        Assert.assertEquals(2, filteredList.size());
        Assert.assertEquals(11, list.size());

        /** adding odd value that doesn't satisfy condition to filtered list */
        try {
            filteredList.add(new Integer[]{667});
            Assert.fail();
        } catch (Exception e){
            Assert.assertEquals(2, filteredList.size());
        }

        /** adding odd value that doesn't satisfy condition to underlying list */
        list.add(new Integer[]{887});
        Assert.assertEquals(2, filteredList.size());
        Assert.assertEquals(12, list.size());
    }

Saturday, November 19, 2011

10 things a programmer should know about Liferay Kaleo Workflow Engine

Liferay developers decided to create their own workflow engine called Kaleo 2 years ago. It's a flexible lightweight workflow engine similar to other open-source engines out there. It is a plugin built on Liferay's ServiceBuilder and it needs to be deployed before you can use it (it is not a part of portal).

The principle consists in defining workflow for resources as an xml definition containing states, tasks, conditions, forks & joins and timer. Apart from timer, all of them contains transition nodes (where to go next). And apart from condition, fork&join all of them may contain actions (what to do) and hereby orchestrate a sequence of events and actions. It is especially useful for content reviewing, validation, approving and quality evaluation.


It is quite easy to use and design workflows in xml directly. But there are definitely use cases that makes you investigate its internal working and modify it or add something that you need.

Following summary lists a few things you should be aware of as a developer :
  • Liferay has it's own simple messaging implementation that is meant only for internal usage, it doesn't allow remote messaging. Kaleo uses it from two reasons :
    • Because it is an external plugin, portal context needs to communicate with it via messaging
    • Because messaging is a suitable choice for workflow engine implementation
  • A workflow definition is always associated with a corresponding resource. There are 2 key entities. For instance, in case of document library :  KaleoDefinition (definition itself) and WorkflowDefinitionLink (association between the definition and folder and file types that will be "workflow aware").
  • You cannot undeploy a definition if such an association exists. So that you first delete the link ( this corresponds to removing workflow from a resource in administration) and then you can deactivate it and undeploy it.
  • When you deploy a workflow definition, it is parsed in XMLWorkflowModelParser into an object model and for each node type (mentioned in 2. paragraph) there is a NodeExecutor type with methods enter, execute and exit
  • Everytime you add a resource, that has a workflow definition link, a workflow instance is created and its execution context is associated with it.
  • Consequently KaleoSignaler ( class that sends messages about entering, executing and exiting ) either call a corresponding NodeExecutor in case of timer node or sends a message to liferay/kaleo_graph_walker destination about entering or leaving a node (transitions). The destination is associated with DefaultGraphWalker, that steps into target nodes and call corresponding NodeExecutors. The methods are implemented differently based on type of the node.
  • Execution context contains so called Instance Token that is carrying information about current position within the workflow node model and and its state. It also holds a reference to workflow and service contexts.
  • Task nodes contain assignment element that determines recipients (content reviewers for instance) based on Role, User or just an email address. You can even dynamically determine this in runtime in scripted-assignment, which is a matter of setting a user or a role that you want in the scripting context. I wrote a blog post about improving scripting experience.
  • Actions are either notifications or whatever you need to be done. For a java developer the obvious choice would be writing groovy scripts.
  • KaleoLog entity contains information like duration, user comments etc. 

I hope this post helped you with getting grasp of Liferay Kaleo Engine's internal working. Cheers

Wednesday, November 9, 2011

Liferay Workflow Kaleo Engine tuning

It's been a pleasure to work with Liferay Workflow Kaleo Engine. I'm not going to introduce it here, you can read about it in the documentation reference. I just wanted to mention one thing that I needed to improve.
    When I was designing complex workflow definitions that were containing a lot of groovy scripts I just didn't want to write them directly in workflow xml definition. Instead, I created a many groovy script files and put ${references} into workflow definitions. During deployment I just expand them.
    With this setup I can profit from IDE support and I don't have to maintain hundreds of lines of code in a workflow definition. I just keep them basic scripts that has up to 15-20 lines. Imagine all those java import declarations, you wouldn't even know that you have that stuff on classpath.

This code takes care of it. I didn't use Liferay DOM API because it is missing a few key methods from classes that extend org.dom4j.Node and I didn't want to work that around. You just change those IllegalStateExceptions to your System ones or something.

  
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentFactory;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
import org.jaxen.JaxenException;
import org.jaxen.SimpleNamespaceContext;
import org.jaxen.XPath;
import org.jaxen.dom4j.Dom4jXPath;

public class WhatEver {

 public byte[] expandScripts(String workFlowPath) {
  ClassLoader cl = Thread.currentThread().getContextClassLoader();

  Document document;
  try {
   document = new SAXReader().read(cl.getResource(workFlowPath));

   List<Element> scripts = findAllElements("script", document.getRootElement());

   expandScripts(cl, scripts);
  } catch (DocumentException e) {
   throw new IllegalStateException("Writing dom4j document failed", e);
  }

  return toByteArray(document);
 }

 private List<Element> findAllElements(String name, Element rootElement) {
  String ns = rootElement.getQName().getNamespaceURI();

  Map<String, String> map = new HashMap<String, String>();
  map.put("x", ns);

  List<Element> scripts;
  try {
   XPath xpath = new Dom4jXPath("//x:" + name);
   xpath.setNamespaceContext(new SimpleNamespaceContext(map));

   scripts = xpath.selectNodes(rootElement);
  } catch (JaxenException e) {
   throw new IllegalStateException("Xpath search for: '" + name + "' failed", e);
  }

  return scripts;
 }

 private void expandScripts(ClassLoader cl, List<Element> scripts) {
  DocumentFactory docFactory = DocumentFactory.getInstance();

  for (Element oldScriptElement : scripts) {
   String text = oldScriptElement.getText();

   if (!isCDATA(oldScriptElement) && text != null && text.startsWith("$")) {
    String scriptPath = text.substring(2, text.length() - 1);

    Element result = docFactory.createElement("script", oldScriptElement.getParent().getQName().getNamespaceURI());

    String script;

    try {
     script = getString(cl.getResourceAsStream(scriptPath));
    } catch (IOException e) {
     throw new IllegalStateException("Reading script: '" + scriptPath + "' failed", e);
    }
    result.addCDATA(script);

    replaceNode(oldScriptElement, result);
   }
  }
 }

 private byte[] toByteArray(Document document) {
  ByteArrayOutputStream baos = new ByteArrayOutputStream();

  XMLWriter writer;
  try {
   writer = new XMLWriter(baos, OutputFormat.createPrettyPrint());

   writer.write(document);
   writer.close();
  } catch (IOException e) {
   throw new IllegalStateException("Writing dom4j document failed", e);
  }

  return baos.toByteArray();
 }

 private boolean isCDATA(Element node) {
  for (Node n : (List<Node>) node.content()) {
   if (Node.CDATA_SECTION_NODE == n.getNodeType()) {
    return true;
   }
  }
  return false;
 }

 private void replaceNode(Element oldNode, Element newNode) {
  List parentContent = oldNode.getParent().content();

  int index = parentContent.indexOf(oldNode);

  oldNode.detach();

  parentContent.set(index, newNode);
 }

 private String getString(InputStream is) throws IOException {
  final char[] buffer = new char[0x10000];
  StringBuilder out = new StringBuilder();
  Reader in = new InputStreamReader(is, "UTF-8");
  int read;
  do {
   read = in.read(buffer, 0, buffer.length);
   if (read > 0) {
    out.append(buffer, 0, read);
   }
  } while (read >= 0);

  return out.toString();
 }
}

Gist

Tuesday, November 8, 2011

How to use Liferay third-party libraries as Maven dependencies ?

Almost all day worth of work but finally I finished DependencyInstallerMojo. Simply put, you provide it with destination of Liferay's lib/versions.xml and inclusion regexp patterns that determine which libraries are to be installed into either local maven repository or a custom location and it installs them as Maven artifacts and generates a pom definition that contains corresponding dependencies, so that you can use itself as dependency and thus transitively include all those libraries on your classpath.

  
 <plugin>
    <groupId>com.liferay.maven.plugins</groupId>
    <artifactId>liferay-maven-plugin</artifactId>
    <version>6.1.0-SNAPSHOT</version>
    <configuration>
      <generatedPomLocation>${project.basedir}/target/generated-pom.xml</generatedPomLocation>
      <generatedPomVersion>${project.version}</generatedPomVersion>
      <generatedPomName>Liferay dependencies</generatedPomName>
      <localRepositoryId>liferay-third-party-deps</localRepositoryId>
      <projArtifactId>${project.artifactId}</projArtifactId>
      <projGroupId>${project.groupId}</projGroupId>
      <dependencyScope>test</dependencyScope>
      <libDirPath>/opt/liferay/portal/lib</libDirPath>
      <localRepositoryPath>/home/old-lisak/fake-repo</localRepositoryPath>
      <include>
        <development>jsf-.*,derby,catalina,ant-.*,jalopy</development>
        <global>portlet.jar</global>
        <portal>chemistry-.*,commons-.*,jackrabbit-.*,spring-.*</portal>
      </include>
    </configuration>
 </plugin>

Most of those properties are set by default, you actually need to set at least :
 libDirPath - to the source xml file (versions.xml in case of Liferay)
 localRepositoryPath - to a custom local maven repository, unless you want to use the default one
 include - this element is a map of sub-directories of libDirPath that contains jar packages

Names of the sub-directories become part of groupId (com.example.development) and regular expressions adhere to java.util.regex split up by colons.


Why Liferay developers might need this ?


Because if you wanted your code to be covered by infrastructure tests, you would have to mock Liferay services quite frequently, unless you have Liferay third party libraries on classpath and you can boot up the spring based infrastructure and  hundreds of its services.
   Mocking might seem to be a suitable alternative until you are refactoring or upgrading Liferay version and all those tests turn into a maintenance hell. I personally use rather infrastructure testing than tons of "functional" unit tests, so that I'm trying to deal with this situation.

A year ago I manually listed maven dependencies corresponding to those listed in lib/versions.xml, just those that I needed for using particular Liferay services, but it is a really hideous thing to do because of all those version conflicts and transitive dependencies.

I wrote this Maven Mojo after I got really repelled by Maven Surefire Plugin's way of dealing with additional classpath resources, classloading hell.

A last possible option how to do this is systemPath property of Maven Dependency of system scope, but it is deprecated and it will be probably not supported in next releases.

If you want to try that out, follow these instructions :

$git clone git@github.com:l15k4/liferay-maven-support.git
$cd liferay-maven-support
$mvn install

Then go into an existing maven project and add this plugin into its build section.

  
 <plugin>
   <groupId>com.liferay.maven.plugins</groupId>
   <artifactId>liferay-maven-plugin</artifactId>
   <version>6.1.0-SNAPSHOT</version>
   <configuration>
     <libDirPath>/path/to/portal/lib</libDirPath>
     <localRepositoryPath>/home/user/test-maven-repository</localRepositoryPath>
     <include>
       <development>jsf-.*,derby,catalina,ant-.*,jalopy</development>
       <global>portlet.jar</global>
       <portal>chemistry-.*,commons-.*,jackrabbit-.*,spring-.*</portal>
     </include>
   </configuration>
 </plugin>

$mkdir /home/user/test-maven-repository
$mvn com.liferay.maven.plugins:liferay-maven-plugin:6.1.0-SNAPSHOT:install-dependencies


If everything goes well, the repository will contain the artifacts and there will be target/generated-pom.xml file in your project.

If you found this useful and you wanted to see this as part of liferay-maven-plugin, you could vote up here: LSP-22947

Monday, November 7, 2011

Does Maven honor principles it is based on in its own architecture ?

  I like Maven a lot, makes Java programming more fun and spares a lot of time. I also understand principles of preserving backward compatibility, but it is supposed to be a comprehension tool and despite of that it still uses Plexus or Javadoc tags that are quite hard to comprehend and imho should be deprecated in version 2 and disappear in version 3. Unfortunately I don't see almost any sings of that.

People know and like Guice, not Plexus that is deprecated anyway... What about promised Plexus substitution with Google Guice in Maven?
   First off, I got quite angry because I couldn't find any leads that would reveal this mysterious Dependency Injection Framework swap. After the hype almost a year ago that maven is going to migrate to Google Guice from Plexus I grep through the maven trunk source base and it is full of org.codehaus.plexus references and the only place where Guice is used is maven/app-engine

Then I hit MNG-4749 and I realized that the compatibility shim is a work of sonatype dev and I found these two posts : From Plexus to Guice (#2) and From Plexus to Guice (#3) that were quite hidden to me and that revealed Sisu project. Basically, they created a bean injection layer on top of the raw custom injection API provided by Guice. These images tell a little more. The left one expresses how the consequences of preserving backward compatibility or avoiding radical refactoring might look like :-)


You can find the source code in sonatype sisu repository. Btw seeing software moving to git is always nice. Do you remember all those sayings how Maven is lightweight a couple years back :-) ? Although I appreciate the effort of sonatype dev. It must have been a tough work and I certainly try it out and change this article accordingly.

Another point of interest is migration to Java Annotations. As Maven comes from JDK1.4 era, it uses Javadoc tags. They are not represented at the byte code level. It seems to be quite a drawback for Plugin and Mojo API. And tough to understand - missing Annotation's convention. Developers know Annotations, but most of them have never worked with Javadoc tags extensively. In case you need to inherit a Mojo class of different artifact and the maven property doesn't get initialized, what then ? F*** around on Jira and study how QDox and all that work :-) ? Or duplicating all code you would inherited otherwise ? Extending Mojos is so hard now.
    There are some rumors it is going to be changed to Annotations, and one can see some activity around it : Java 5 Annotations for plugins and MPLUGIN-189, but I'm afraid that it might end up with some sort of compatibility layer as in case of the sisu project.
It is so hard
It is a comprehension tool for a user, but these things make it tough to work with for plugin developers. It is sad, but it is true.

I believe that much more developers would create Maven Mojo or plugin instead of a Shell script even for simple things, if these 2 critical issues were solved.

Sunday, November 6, 2011

Liferay database load balancing and sharding

If you are considering optimizations at the database level, you should be aware of 2 things :

Sharding

Technique that scales your database along portal instances : Database Sharding. Not that I have used it already, I haven't found a use case for this yet, so I'm setting up liferay-maven-plugin to call ServiceBuilder with these properties, so that it doesn't generate file shard-data-source-spring.xml

    
 <configuration>
   <springDynamicDataSourceFileName>
     null
   </springDynamicDataSourceFileName>
   <springShardDataSourceFileName>
     null
   </springShardDataSourceFileName>
 </configuration>

   I just wanted to mention it for those that are wondering what the shard-data-source-spring.xml is about. So that they don't think of it as "shared" type but that database shard is a known technique of horizontal partitioning of database design.

   I am also turning generation of dynamic-data-source-spring.xml off, because the solutions I've been developing didn't expect any massive load, so that  Read-write split technique is not needed either .

Read-write Split

Read-write split targets database replication. Not in the terms of a bulk file transfer, but rather a proxy mechanism that sends update/delete statements to multiple database servers, that is to say, having a database update against one server reflected on another server(s) in a real-time fashion. Databases use so called master-slave model where the master keeps a log of all transactions that it has performed and slave connects to it and performs the updates as it reads the log entries. The slave updates its position in the master's transaction log and waits for another one.
    For load balancing the best setup would be having two servers that are slaved to each other, which results in a slave-slave or master-master configuration. Transactions are reflected and there is no difference in the role of those servers.
    For instance MySQL database can be configured for this scenario quite easily and Liferay is quite ready for various requirements. Dynamic data sourcing for instance lets you have 2 data sources, one for READs and the second for WRITEs.
 
I have't needed this yet, as I said, so I'm turning it off as well in liferay-maven-plugin. The less files in my project, the better.


Anyway, some years of Liferay experience always come handy when you need to deal with stuff like this :-)