Open IT Experts for Enterprise

Using Zeppelin for Alfresco Data Analysis

Cesar Capillas
Cesar Capillas

Whole Lotta Apache

Apache Zeppelin is an open web-based notebok that enables interactive data analytics, from data ingestion, data exploration, reporting and visualizations, until sharing or collaboration features.

Interactive browser-based notebooks enable to be more productive by developing, organizing, executing, and sharing data code and visualizing results without referring to the command line. It supports programming languages such as Python, Scala, Hive, SparkSQL, shell or markdown, via Zeppelin language interpreters, which also includes JDBC, Elastic Search or SOLR interpreters.

In last Devcon 2018, I saw Apache Zeppelin used as a report builder in Michael Suzuki’s talk about Alfresco Reporting and Analytics (also in the analytics roadmap by Harry Peek), for getting Alfresco reporting information via SQL in SOLR. Apache Zeppelin has a SOLR interpreter that allows to execute SQL and SOLR API, in SOLR 6. Some preview of this reporting features were shown in the conference:

While these new reporting features arrive to Alfresco during 2018, let’s do some Alfresco maintenaince tasks on this idea based on the available interpreters in Apache Zeppelin:

1. An example of shell interpreter, for greping some agregatte results on Alfresco logs:

2. An example of python scripting interpreter using CMISlib from Apache Zeppelin, for checking nodes properties or doing some CMIS queries:

3. An example of SQL queries to Alfresco database:

4. An example of SOLR interpreter, for the monitoring use case of collecting Alfresco logs and metrics data in a SOLR collection, using SQL:

Más entradas

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *