Auditing added content in Alfresco repository

Alfresco Activity Console in Elastic Search 

One of the most important questions for many customers, once they are working with a document management system (now content services), is to know how they are using it. They want to obtain basic usage metrics about their business processes and custom content types to understand quantitatively how critical they are. Sometimes even simpler questions (sometimes audit related or statistics related) are not easy to obtain, such as how many documents do we have in the repository, what are the most used content types or what are the users/applications that ingest more content

For this purpose we used in the past, with some drawbacks (IHMO):

  • SQL Queries: Yes, we used SQL for obtaining how many documents of a given type are being used, directly in Alfresco relational database. Later SOLR and Alfresco REST API simplified this a little bit.  
  • Auditing module in Alfresco: Probably too much heavy for simple purposes, and sometimes defenestrated for performance issues, it saves data on relational database provinding a brute aproximation for audited data. This is not the easiest for query (even with the admin dashlet), but we can handle this audit data via REST API for developing / creating a custom report. 
  • AAAR / Alfylitics: Pentaho-based reports on auditing module and ETL jobs. This aproximation is interesting although it is not simple. We need another BI system, custom reports and custom ETL processes. Probably the highest learning curve for a simple report.
  • AuditShare: One of my favorite addons, because it is simple for visualizing some statistics of the repo, sites and users in Alfresco Share. Lacks of customization, but it provides the essentials for final users and site admins. I used this addon in several Alfresco 4 instances, for some getting general information about your repo.
  • Share OOTB Dashlets + Feed Activity: By default, Alfresco Share provides a feed activity that it is not enough for auditing purposes, and two dashlets for getting insights in Sites (Site Contribution Breakdown and Site File Type Breakdown), but this is not enough for getting a general view of the repository.

Now, Alfresco Enterprise subscription provides and Insight Engine based on SOLR SQL features and Apache Zeppelin Notebook based reports, with no ETLing as main benefit. 

But most of the times, we do not have this kind of enterprise subscription, for example when we are using Alfresco Community, or when we do not have / want  a complementary BI system as Pentaho. In fact, the information about the custom content types that are ingested by applications play an important role for an organization (and their processes), answering questions such as what are the most critical applications.

For this task, it is useful to inspect the daily use of an Alfresco instance for some given custom types. For this post we did a probe of concept based on an Alfresco custom behaviour for writing a content creation log. The main idea is to write a simple log event when a document of a given type is created (on create and version create node policies), with useful information for filtering in an external system (such as uuid, size, mimetype, path, content type, site, version). This simple aproximation allows to index this event log data in Elastic Search and to get the following dashboard in Kibana. In the past, we did something similar for Alfresco logs information.

Below, we show an illustrating image for the resulting Kibana dashboard, with filter controls and metrics for the added content:

In the following posts, we will enhace this example.

Links:

00

More Blog Entries

thumbnail
thumbnail

0 Comments