Listing thousands of noderefs with Alfresco REST API

A useful script for listing nodes with Alfresco REST API

Many situations in Alfresco support involve some minor modifications of thousands of documents or/and folders. Let's say we need to clean some metadata property, to add or remove some aspect, or renaming filenames for a given bunch of nodes, where bunch is here defined as several thousands of nodes. In these cases, we often obtain the corresponding list of noderefs, and later we process batch actions over the given set of nodes.

For example:

  • Customer did large bulk imports and now we need to delete them, for example, because disks are near full.
  • Final users share publicly too many documents by error, and you want to unshare all of them (or control them).
  • Developers wants to add some new aspect, over all the documents of a custom organizational type. For example, they want to apply index control aspect for not indexing full content in some custom types.
  • Marketing wants to add a new custom metadata property for all documents of a site.
  • Or simply you need some reporting list of the previous cases.


There exists several ways of scripting in Alfresco, and I won't say it better than Jeff Potts, so please check this highly recommended blog post, if your are reading. 

Between my preferences for these tasks (IMO), I want to mention JavaScript Console for small scripting, normally involving less than a thousand of nodes and harmless operations, then python CMISlib or Groovy scripts, when I want to control a little bit more on the changes and I do not need very fast processing, and a combination of Alfresco REST API batch requests combined with custom JS-based Alfresco webscripts. In most of these situations, I have to obtain a list file with the Alfresco noderefs.

For example, imagine that I have to do some minor action over:

  • A list of noderefs for all documents of a given site
  • A list of noderefs for all documents of a given content type
  • A list of noderefs for all documents with a given structured name

By the way, Angel Borroy wrote a very interesting blog post related to this task:

Sometimes in the past, I used postman, but in my case I use the following shell script, normally in some node of Alfresco server using an internal admin user. The script uses curl and jq for parsing JSON requests of Alfresco REST API, usually several requests paginated with 1000 results each where: 

  • First I obtain an Alfresco ticket, that I will use for the next requests.
  • Later I get the total number of results 
  • Finally I did several requests 1000 by 1000 
So I can do things like this:

$ alf-rest-search.sh SITE swsdp
$ alf-rest-search.sh TYPE cm:content
$ alf-rest-search.sh TYPE cm:folder
$ alf-rest-search.sh ASPECT qshare:shared
$ alf-rest-search.sh ASPECT cm:indexControl
$ alf-rest-search.sh name 'Project*'
$ alf-rest-search.sh cm:title 'Project'
$ alf-rest-search.sh cm:creator 'System'



For example:

For getting the uuids of the contents from default site in Alfresco.

$ ./alf-rest-search.sh SITE swsdp
# 102
7bb9c846-fcc5-43b5-a893-39e46ebe94d4 cm:content coins.JPG
1f4ce811-1c61-4553-ac23-63b68bf1d121 cm:content plugs.jpg
38db832f-8279-460f-99b8-fed560c8da8e cm:thumbnail doclib
0f672fb8-bbdb-41bb-84f3-7b9bb1c39b30 cm:content wires.JPG
bf581ca9-e270-413d-9796-635544674781 cm:thumbnail doclib
72948f84-4bf1-4ec5-8378-1bed0951600a cm:content low consumption bulb.png
14e2200e-9f1c-4274-8b6b-95dc9d59d204 cm:content wind turbine.JPG
79a03a3e-a027-4b91-9f14-02b62723591e cm:content GE Logo.png
3deb5413-2c1d-4015-b9c9-2be9648446bc cm:content logo.png
43485b48-2ca7-4077-a00c-9bfe810f9fa1 cm:content sample 1.png
.
.
.

Or just before doing some public shares of some files of the site, we can get:

$ ./alf-rest-search.sh ASPECT qshare:shared
# 5
1a0b110f-1e09-4ca2-b367-fe25e4964a4e cm:content Project Contract.pdf
723a0cff-3fce-495d-baa3-a3cd245ea5dc cm:content inv I200-109.png
f3bb5d08-9fd1-46da-a94a-97f20f1ef208 cm:content Meeting Notes 2011-01-27.doc
5fa74ad3-9b5b-461b-9df5-de407f1f4fe7 cm:content budget.xls
5515d3e1-bb2a-42ed-833c-52802a367033 cm:content Project Objectives.ppt


Looking some doc files in Alfresco default installation:

$ ./alf-rest-search.sh name '*doc'
# 4
ba003576-1bc5-4fca-8bc4-e9987dcf1937 cm:content doc_info.ftl
f3bb5d08-9fd1-46da-a94a-97f20f1ef208 cm:content Meeting Notes 2011-01-27.doc
150398b3-7f82-4cf6-af63-c450ef6c5eb8 cm:content Meeting Notes 2011-02-03.doc
a8290263-4178-48f5-a0b0-be155a424828 cm:content Meeting Notes 2011-02-10.doc
00

More Blog Entries

thumbnail
thumbnail

0 Comments