Using python API for SOLR
In SOLR, you may find different client APIs for your favourite programming language such as Java, Python, Ruby, Perl or Javascript. Basically, client apps can reach Solr by creating HTTP requests and parsing the corresponding HTTP responses, encapsulating much of the work of sending requests and parsing responses, and making easier to write client applications. If you are a java programmer you will be confortable with SorlJ, while you can use pysolr or rsolr libs, for Python and Ruby languages. Rsolr was used previously in the output connector for sending log data from logstash to SOLR in a post about monitoring metrics with SOLR, Banana and Apache Zeppelin. By the way, the wt parameter in the query allows to choose an appropiate output format for the chosen API, for example JSON, XML, binary (used by SolrJ), Python, PHP, Ruby, XSLT or CSV.
The following script illustrates how to send a query to SOLR in python using http requests (urllib) and parsing the JSON response (simplejson), without using a client API. You need to rpip the cited libraries if you don’t have them installed.
#! /usr/bin/python import urllib import simplejson import pprint import sys host = "localhost" port = "8983" collection = "techproducts" qt = "select" url = 'http://' + host + ':' + port + '/solr/' + collection + '/' + qt + '?' q = "q=ipod" fl = "fl=id,name" fq = "fq=" rows = "rows=10" wt = "wt=json" #wt = "wt=python" params = [ q, fl, fq, wt, rows ] p = "&".join(params) connection = urllib.urlopen(url+p) if wt == "wt=json": response = simplejson.load(connection) else: response = eval(connection.read()) print "Number of hits: " + str(response['response']['numFound']) pprint.pprint(response['response']['docs'])
Take note that we can use wt=python or wt=json, but with wt=python we need to use an eval function later, which may be potencially insecure, while JSON format is a more robust response format. This results in:
$./do-search.py Number of hits: 3 [{'id': 'IW-02', 'name': ['iPod & iPod Mini USB 2.0 Cable']}, {'id': 'F8V7067-APL-KIT', 'name': ['Belkin Mobile Power Cord for iPod w/ Dock']}, {'id': 'MA147LL/A', 'name': ['Apple 60 GB iPod with Video Playback Black']}]
On the other hand, we can use PySolr client API, a lightweight Python wrapper for SOLR.
#! /usr/bin/python import pprint import pysolr import sys host = "localhost" port = "8983" collection = "techproducts" q = "ipod" fl = "id,name" qt = "select" fq = "" rows = "10" url = 'http://' + host + ':' + port + '/solr/' + collection solr = pysolr.Solr(url, search_handler="/"+qt, timeout=5) results = solr.search(q, **{ 'fl': fl, 'fq': fq, 'rows': rows }) print("Number of hits: {0}".format(len(results))) for i in results: pprint.pprint(i)
$./do-search-pysolr.py Number of hits: 3 {u'id': u'IW-02', u'name': [u'iPod & iPod Mini USB 2.0 Cable']} {u'id': u'F8V7067-APL-KIT', u'name': [u'Belkin Mobile Power Cord for iPod w/ Dock']} {u'id': u'MA147LL/A', u'name': [u'Apple 60 GB iPod with Video Playback Black']}
This second form is more appropiate if you have a cluster with Apache Zookeeper, or for some other methods such add, delete or update.
# For SolrCloud mode zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181") solr = pysolr.SolrCloud(zookeeper, collection)
This is tested in SOLR 6.6 with SOLRCloud.
Links:
- https://lucene.apache.org/solr/guide/6_6/client-api-lineup.html
- https://lucene.apache.org/solr/guide/6_6/using-python.html#UsingPython-SimplePython
- https://pypi.python.org/pypi/pysolr/3.6.0
- https://www.zylk.net/es/web-2-0/blog/-/blogs/more-on-monitoring-dashboards-for-alfresco-using-solr-banana-and-apache-zeppelin