How to deactivate metadata extraction in Alfresco

This a simple tip for deactivating metadata extractors in Alfresco 4 and 5 (from Alfresco 4.1.4 and above) 

When do you need this ?
  • In a bulk filesystem import, you can lose performance in data injection due metadata extracters.
  • When you have a huge repository with millions of documents, and you want to apply indexation policies to prevent a non-controlled indices growth.

In alfresco-global.properties just type (and then restart the service):

# pdfs
extracter.PDFBox.enabled=false
# office types
extracter.Office.doc.enabled=false
extracter.Office.xls.enabled=false 
extracter.Office.ppt.enabled=false
# images (no exif aspect)
extracter.TikaAuto.enabled=false
# docx,xlsx,pptx
extracter.Poi.enabled=false
 
There are more beans involved for other mimetypes. Anyway they have to be deactivated one by one.
 

You can check that data is not being extracted, activating the logger in custom-log4j.properties (repository):

log4j.logger.org.alfresco.repo.content.metadata.MetadataExtracterRegistry=debug 

00

More Blog Entries

1 Comment

AZ
Andrej Zirnbirnstern 4 Months Ago

thank you, it was very helpful.

00