Disabling TBO (BOF v2) in Documentum 5.3+

Sometimes during development it is useful to disable a Typed Based Object (TBO). In Documentum 5.3 (and older) when Business Object Framework was at version 1 it was easy, all that has to be done was commenting out the TBO definition line in $DOCUMENTUM/config/dbor.properties. From now on the TBO will be disabled in that particular DFC instance.

In BOF version 2 things have complicated because dbor.properties is no longer used (or at least recommended), instead all BOF objects are defined in the repository as dmc_module object instances. There is an easy way to disable a TBO globally by renaming the dmc_module object in /System/Modules/TBO but it will disable that TBO for all users and sometimes when doing troubleshooting on, for example, production system it is desirable to disable TBO only on one client.

In order to disable TBO on a single client machine we have to do a trick. Each DFC instance has it’s own BOF cache, which usually is in $DOCUMENTUM/cache. We have to locate the dmc_jar implementing the TBO and replace it with a dummy one which will not contain any business logic. How to locate the right dmc_jar? Well, it requires some investigation, you can either use DQL to look for the r_object_id of dmc_jar or just check files in the cache folder and look for JAR which will contain class implementing given TBO.

Next step is to prepare a dummy TBO. Dummy TBO is in all cases almost the same, what changes is the name of the class and package name. Sometimes different TBO objects are implementing different interfaces (for example IDfDynamicInheritance is not a required one) so in worst case you can look it up in the sources (or decompile the original TBO JAR file) and see how it was declared.

Typical Java code is following:


package com.documentum.cvp.common;

import com.documentum.fc.client.DfDocument;
import com.documentum.fc.client.IDfBusinessObject;
import com.documentum.fc.common.DfException;
import com.documentum.fc.common.IDfDynamicInheritance;

public class CvpDocument extends DfDocument
    implements IDfBusinessObject, IDfDynamicInheritance
{

    public CvpDocument()
    {
    }

    public String getVersion()
    {
        return "1.0";
    }

    public String getVendorString()
    {
        return "Documentum";
    }

    public boolean isCompatible(String version)
    {
        return version.compareTo("1.0") == 0;
    }

    public boolean supportsFeature(String feature)
    {
        return false;
    }
}

When dealing with DfFolder just change DfDocument to DfFolder and it should be fine.

After compiling the Java class and replacing the old JAR in the cache client application has to be restarted. It is also worth mentioning that the cache is periodically refreshed so it is a good idea to check whether our dummy TBO implementation was not overwritten by the real one. From my experience I can say that it happens very rarely.

I have tested this approach and I can confirm that it works, it saved me a lot of troubles during data migration between systems using FirstDoc.

Writing Mavenized JUnit tests in Alfresco.

Writing JUnit tests in Alfresco projects using Maven is not very straightforward. The main problem is lack of a “parent POM” which would gather all Alfresco dependencies into one convenient package. I will show you how to create such parent POM as well as how to create a simple JUnit test project.

Prerequisites are:

  • Alfresco (I have used 3.3g)
  • Python
  • Maven (of course :))

We will use dependencies from Alfresco WAR file, perhaps the right way is to get them from Alfresco SDK, but since I haven’t experienced any problems with using JARs from alfresco.war and since it is also easier to build pom.xml from alfresco.war then this is what I recommend.

So, to create our parent pom.xml with all Alfresco dependencies we will need a small script:


import os
import sys

files = os.listdir(sys.argv[1])

groupId="alfresco-sdk"
version="3.3g"
classifier="community"

print """
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.metasys</groupId>
<artifactId>alfresco-sdk-parent</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>alfresco-sdk-parent</name>
<dependencies>
"""

for file in files:
artifactId = file[:-4]
c = "mvn install:install-file -Dfile=" + os.path.join(sys.argv[1], file) + \
       " -DgroupId=" + groupId + \
       " -DartifactId=" + artifactId + \
       " -Dversion=" + version + \
       " -Dpackaging=jar -Dclassifier=" + classifier + \
       " -DgeneratePom=true -DcreateChecksum=true"
s = "\t<dependency>\n\t<scope>test</scope>\n\t<groupId>" + groupId + \
       "</groupId>\n\t<artifactId>"+ artifactId + \
       "</artifactId>\n\t<version>"+ version + \
       "</version>\n\t<classifier>community</classifier>\n\t</dependency>\n"
print s
if len(sys.argv) > 2 and sys.argv[2] == 'install':
      os.system(c)

print """
</dependencies>
</project>
"""

There are some hardcoded values in the script. Change them if you want, for example if you’re working with Alfresco 3.4 then change the version variable.
We will invoke it like this:


mkdir alfresco-parent-pom
cd alfresco-parent-pom
python alfresco-2-maven.py <Path to Alfresco's WEB-INF/lib folder> >pom.xml

Now, move the pom.xml to a newly created folder and install it using:


mvn install

At this moment we have a Maven project ready to use but we still don’t have dependencies in the Maven repository. To install them start the script again, this time with ‘install’ parameter:


python alfresco-2-maven.py <Path to Alfresco's WEB-INF/lib folder> install

This will take some time to finish but eventually you will end up with a local repository with all Alfresco dependencies.

No, let’s move on to a real project with some JUnit tests. Our test will have to start an Alfresco repository, each Alfresco repository apart from database and filestore needs configuration. You can find Alfresco configuration in alfresco/WEB-INF/classes/alfresco folder. In order to start the tests we will need that folder in our CLASSPATH or create a JAR file with configuration files. I prefer the second solution. So to prepare the configuration artifact follow these steps:


cd Alfresco/tomcat/webapps/alfresco/WEB-INF/classes
zip -r /tmp/config.jar .
mvn install:install-file -Dfile=config.jar -DgroupId=alfresco-sdk -DartifactId=config -Dversion=3.3g -Dpackaging=jar -Dclassifier=community -DgeneratePom=true -DcreateChecksum=true

You can cleanup the config.jar before installing it in the local Maven repository. If you want to have multiple configurations then just modify the classifier parameter or simply use different version modifier.

Now, finally, we can move on to our test class. We start with the pom.xml.


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.metasys</groupId>
    <artifactId>test-suite</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <name>test-suite</name>

    <parent>
        <groupId>com.metasys</groupId>
        <artifactId>alfresco-sdk-parent</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <build>
        <resources>
            <resource>
                <filtering>false</filtering>
                <directory>src/test/resources</directory>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.5</source>
                    <target>1.5</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <scope>test</scope>
            <groupId>alfresco-sdk</groupId>
            <artifactId>config</artifactId>
            <version>3.3g</version>
        </dependency>
  <dependency>
            <groupId>javax</groupId>
            <artifactId>servlet</artifactId>
            <version>1.2</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.6</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

As you can see apart from referencing parent POM I have also added Mysql JDBC driver and servlet-api as dependencies. These two jars are present in Alfresco SDK but not in alfresco.war and that’s why they were not picked up the Python script. Of course there is also the Alfresco Configuration dependency we created before.

Now, since we have the pom.xml let’s move on to creating the Java class, we will put it in a package named com.metasys.tests, so let’s create the folder structure first:


mkdir -p JUnitTest/src/test/java/com/metasys/tests
mkdir -p JUnitTest/src/test/resources/alfresco/extension
mkdir -p JUnitTest/src/test/resources/alfresco/desktop

Change folder to src/test/java/com/metasys/tests and create a new Java class:


package com.metasys.tests;

import org.alfresco.model.ContentModel;
import org.alfresco.repo.security.authentication.AuthenticationComponent;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.action.ActionService;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.util.BaseAlfrescoTestCase;
import org.junit.Test;

public class SimpleTest extends BaseAlfrescoTestCase {

    protected NodeRef companyHomeRef;
    protected NodeRef rootFolderTestRef;

    @Override
    protected void setUp() throws Exception {
        setUpContext();

        this.serviceRegistry = (ServiceRegistry) ctx.getBean(ServiceRegistry.SERVICE_REGISTRY);
        this.nodeService = serviceRegistry.getNodeService();
        this.contentService = serviceRegistry.getContentService();
        this.authenticationComponent = (AuthenticationComponent) ctx.getBean("authenticationComponent");
        this.actionService = (ActionService) ctx.getBean("actionService");
        this.transactionService = serviceRegistry.getTransactionService();

        authenticationComponent.setCurrentUser("admin");
        SearchService searchService = this.serviceRegistry.getSearchService();
        ResultSet rs = searchService.query(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, SearchService.LANGUAGE_XPATH,
                "/app:company_home");
        if (rs.length() != 1) {
            fail("Could not find company home");
        }
        companyHomeRef = rs.getNodeRef(0);
    }

    @Test
    public void testSimpleWatermarking() throws Throwable {
        rootFolderTestRef = serviceRegistry.getNodeService().createNode(
                companyHomeRef, ContentModel.ASSOC_CONTAINS,
                ContentModel.TYPE_FOLDER, ContentModel.TYPE_FOLDER).getChildRef();
  assert(rootFolderTestRef != null);
        nodeService.setProperty(rootFolderTestRef, ContentModel.PROP_NAME, "TestObject");
  assert(nodeService.getProperty(rootFolderTestRef, ContentModel.PROP_NAME).equals("TestObject"));
    }
}

We will also need bootstrap configuration, normally these files are somewhere in Tomcat folder tree, since we’re not using Tomcat for starting up the repository we will have to find a place for them.

First file is dev-contex.xml, it has to be saved in src/test/resources/alfresco/extension:


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
    <bean id="global-properties" class="org.alfresco.config.JndiPropertiesFactoryBean">
        <property name="locations">
            <list>
                <value>classpath:alfresco/repository.properties</value>
                <value>classpath:alfresco/domain/transaction.properties</value>
  <!-- <value>classpath:alfresco/jndi.properties</value> -->
  <!--  Overrides supplied by modules -->
                <value>classpath*:alfresco/module/*/alfresco-global.properties</value>
  <!--  Installer or user-provided defaults -->
                <value>classpath*:alfresco-global.properties</value>
                <value>classpath:alfresco/extension/dev.properties</value>
            </list>
        </property>
        <property name="systemPropertiesModeName">
            <value>SYSTEM_PROPERTIES_MODE_OVERRIDE</value>
        </property>
<!-- Extra properties that have no defaults that we allow to be defined through JNDI or System properties -->
        <property name="systemProperties">
            <list>
                <value>hibernate.dialect</value>
                <value>hibernate.query.substitutions</value>
                <value>hibernate.jdbc.use_get_generated_keys</value>
                <value>hibernate.default_schema</value>
            </list>
        </property>
    </bean>
</beans>

Next one is dev.properties, save it to the same folder as dev.properties:


dir.root=/home/kbryd/AlfrescoT2/alf_data
index.recovery.mode=AUTO
integrity.failOnError=true
db.name=alfrescoT2
db.username=alfrescoT2
db.password=alfrescoT2
db.host=localhost
db.port=3306
db.driver=org.gjt.mm.mysql.Driver
db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}
hibernate.dialect=org.hibernate.dialect.MySQLInnoDBDialect

Obviously change the database name, username, password and path to the Alfresco filestore. The last (really!) missing bit is a package of Alfresco desktop files required during startup, just copy them from alfresco/WEB-INF/classes/alfresco/desktop to your project’s src/test/resources/alfresco/desktop folder.

Now, go to the root folder of the test project and use Maven to start it:


mvn test

It will start up the repository and then execute tests, you should see something like this:


Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.947 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /tmp/test-project/target/test-suite-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 44 seconds
[INFO] Finished at: Sat Feb 26 14:45:26 CET 2011
[INFO] Final Memory: 30M/258M
[INFO] ------------------------------------------------------------------------

One ending note. You may wonder why I am not calling super.setUp() in the Java class. The only reason I am doing it is because the implementation of setUp() creates a separate Store for each test. I don’t like this default behavior and that’s why I am calling only setUpContext() and then creating all required beans in my test class.

I hope that this is useful, please email me or leave comments if you have any questions!

Stamper is ready!

Stamper is ready. Finally 🙂 I would like to announce that the first version of an extension for Alfresco for watermarking and securing PDF documents is available!

Stamper is a quite extensive tool for Alfresco which simplifies watermarking of PDF documents. Basically any form of watermarks is available, not only static watermarks (like JPG, PNG etc. images) are available but also dynamic watermarks (called layers). Layers are SVG files and can change each time you view a document because layers contain control sequences (variables) which will be replaced by their real values upon viewing.

Stamper is not only about watermarking. It can also sign PDF documents and generate electronic signature pages (even with scanned signatures!).

And of course Stamper is quite nicely integrated with Alfresco. It means that you can call it from Alfresco Java API, or JavaScript API or simply use it from Alfresco Explorer GUI.

CMIS support for Stamper as well as Nuxeo version of Stamper is on it’s way!

You can find out more about it on Stamper’s website: stamper.metasys.pl

Speeding up PDF indexing in Alfresco 3.3

I work a lot with PDF files and have noticed that Alfresco is really slow with indexing them. It’s not Alfresco’s fault per se but rather fault of the underlying library (PDFBox) which extracts text from PDF documents which is then indexed by Lucene. PDF is a format which sometimes makes it really hard to correctly extract text from a document, not only content is often compressed but also PDF is a subset of PostScript language and as every programming language PostScript can generate text in not organized or logical way (for example, sometimes PDF can output text on a bottom of page and then on top). That’s why the whole problem is not trivial.

PDFBox used by Alfresco is written in Java which slows down the whole thing even more. Hopefully there are solutions which can speed it up noticeably.

The idea of speeding up the indexer is not mine, original idea was described on a Think Alfresco blog but unfortunately the example configuration code which can be found there does not work with more recent Alfresco versions. I have updated it and now it works correctly with Alfresco 3.3 and newer.

Solution is quite easy, all we have to do is to define a new transformer which will be using xpdf’s pdftotext executable to extract text. While this may sound “hacky” it is not, there are other format transformers in Alfresco which work in very similar way (for example ImageMagick is used for some image related transformations). And it is important to stress out that pdftotext is REALLY much faster than PDFBox.

For example, my “reference” large PDF document which has 70MB and 13700 pages is processed in 30 seconds by pdftotext and PDFBox needs 20 minutes.

OK, so how to do it? It is actually very easy, all we have to do is to undefine existing PDFBox transformer Spring bean and define a new one which will be invoking pdftotext executable. In practice all you have to do is to take the Spring XML file as shown below and copy it to $ALFRESCO_HOME/tomcat/shared/classes/alfresco/extension/pdf-indexer-extract-content-context.xml and restart the Tomcat server.

[cc lang=”xml”]









${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext -enc UTF-8 ${source} ${target}


${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext.exe -enc UTF-8 ${source} ${target}


chmod 775 ${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext-linux



cmd.exe /C dir

application/pdf text/plain

transformer.worker.PdfToTextTool org.alfresco.repo.content.transform.ContentTransformerWorker



application/pdf


[/cc]

Good luck! Let me know if this was useful for you!

Log4j madness.

Some things will never change, like log4j configuration madness. I have lost so much time trying to fix log4j configuration in not so trivial setups (involving Documentum’s JBoss server or Alfresco running in Tomcat) that it is almost impossible to believe that I have never stumbled on this extremely useful advice. So guys, remember this: if you will ever have a problem with log4j use this:

-Dlog4j.debug=true

This will tell you, first of all, where is the log4j.properties located and what are the appenders, categories and so on. Now working with log4j is a piece of cake 😉