This a simple tip for deactivating metadata extractors in Alfresco 4
and 5 (from Alfresco 4.1.4 and above)
When do you need this ?
-
In
a bulk filesystem import, you can lose performance in data
injection due metadata extracters. - When you have a huge repository with millions of documents, and
you want to apply indexation policies to prevent a non-controlled
indices growth.
In alfresco-global.properties just type (and then
restart the service):
# pdfs extracter.PDFBox.enabled=false # office types extracter.Office.doc.enabled=false extracter.Office.xls.enabled=false extracter.Office.ppt.enabled=false # images (no exif aspect) extracter.TikaAuto.enabled=false # docx,xlsx,pptx extracter.Poi.enabled=false
There are more beans involved for other mimetypes. Anyway they have
to be deactivated one by one.
to be deactivated one by one.
You can check that data is not being extracted, activating the logger
in custom-log4j.properties (repository):
log4j.logger.org.alfresco.repo.content.metadata.MetadataExtracterRegistry=debug