Last days I've been involved in an enterprise search project based on Apache Nutch and Apache SOLR projects. As you probably know, SOLR is a powerful enterprise search and index engine with a powerful REST API, which exposes its features as query, index, delete, commit and optimize, and also including a very useful admin interface. Web applications developed in any programing language (such as Java, .NET, Python, or Ruby), can easily call this REST API and perform SOLR operations, which are provided in various response formats based on programming languages.
For the project, I created a small collection of shell utils for doing the usual sysadmin tasks in a more confortable way, and reusing some SOLR configuration parameters. On this set of tools, I used mainly curl and jq (or jshon). For example, the following commands show SOLR collections:
$ curl -s "http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json" | jq ".cluster.collections" | jq '. .configName'
$ curl -s "http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json" | jq ".cluster.collections" | jshon -k
Between the utils, I'm going to show you two examples:
First, do-search.sh is a simple shell script to check simple searches in SOLR from command line, using the REST API. It implements SOLR queries and it accepts fq (fields) or fl (filters) parameters. It is useful for testing a quick search in SOLR (or if you are a terminal fan).
do-search.sh <search-terms> [<fl-params=url,h1,title,score> <fq-params=lang:es> <qt-params=select|elevate> <collection-name=> <solr-server=localhost> <port=8983>]
The first argument is the search term (that is properly encoded in the shell script), the second one is about the fields indexed for the collection and defined in your SOLR schema, that we want to show in the output. Then qt is about query type in SOLR (for example, select or elevate), and later arguments let you to specify the collection name, the solr server host and ports used.
$ ./do-search.sh "alfresco solr" id,title,score,lang "lang:en"
which is executing the following curl command:
$ curl -s "http://localhost:8983/solr/zylk/select?fq=lang:en&fl=id,title,score,lang&indent=on&q=alfresco%20solr&rows=10&start=0&wt=json"
and providing the following result:
"title": "- Configuring contentstore and SOLR indices in Alfresco 5 - zylk",
"title": "- How to track SOLR indexation process in Alfresco - zylk",
"title": "- Performing a full reindex with Solr for Alfresco ECM - zylk",
"title": "- SOLR web plugin for Liferay 5.2.3 SE and SOLR server 1.4 - zylk",
"title": "Using an Apache stack for indexing your logs and metrics - More on monitoring dashboards for Alfresco using SOLR, Banana and Apache Zeppelin - zylk",
As you see, I got the most popular articles about "alfresco solr" topic in this site. Check the links for more info about Alfresco and SOLR.
Another useful shell script may be for monitoring SOLR live nodes:
check_solr.sh <SOLRSERVER=localhost> [<PORT=8983> <NUMSERVERS=1>]
$./check_solr.sh solr6.zylk.net 8983 1
INFO: SOLR (1 live nodes) = [ "solr6.zylk.net:8983_solr"]
You can find the scripts in my gist: