Blogs

SOLR client APIs

Using python API for SOLR

In SOLR, you may find different client APIs for your favourite programming language such as Java, Python, Ruby, Perl or Javascript. Basically, client apps can reach Solr by creating HTTP requests and parsing the corresponding HTTP responses, encapsulating much of the work of sending requests and parsing responses, and making easier to write client applications. If you are a java programmer you will be confortable with SorlJ, while you can use pysolr or rsolr libs, for Python and Ruby languages. Rsolr was used previously in the output connector for sending log data from logstash to SOLR in a post about monitoring metrics with SOLR, Banana and Apache Zeppelin. By the way, the wt parameter in the query allows to choose an appropiate output format for the chosen API, for example JSON, XML, binary (used by SolrJ), Python, PHP, Ruby, XSLT or CSV. 


The following script illustrates how to send a query to SOLR in python using http requests (urllib) and parsing the JSON response (simplejson), without using a client API. You need to rpip the cited libraries if you don't have them installed.

#! /usr/bin/python
import urllib
import simplejson
import pprint
import sys

host       = "localhost"
port       = "8983"
collection = "techproducts"
qt         = "select"
url        = 'http://' + host + ':' + port + '/solr/' + collection + '/' + qt + '?'

q          = "q=ipod"
fl         = "fl=id,name"
fq         = "fq="
rows       = "rows=10"
wt         = "wt=json"
#wt        = "wt=python"
params     = [ q, fl, fq, wt, rows ] 
p          = "&".join(params)

connection = urllib.urlopen(url+p)

if wt == "wt=json":
  response   = simplejson.load(connection) 
else:
  response   = eval(connection.read())

print "Number of hits: " + str(response['response']['numFound'])
pprint.pprint(response['response']['docs'])

Take note that we can use wt=python or wt=json, but with wt=python we need to use an eval function later, which may be potencially insecure, while JSON format is a more robust response format. This results in:

$./do-search.py

Number of hits: 3
[{'id': 'IW-02', 
  'name': ['iPod & iPod Mini USB 2.0 Cable']},
 {'id': 'F8V7067-APL-KIT',
  'name': ['Belkin Mobile Power Cord for iPod w/ Dock']},
 {'id': 'MA147LL/A', 
  'name': ['Apple 60 GB iPod with Video Playback Black']}]

On the other hand, we can use PySolr client API, a lightweight Python wrapper for SOLR.

#! /usr/bin/python
import pprint
import pysolr
import sys

host       = "localhost"
port       = "8983"
collection = "techproducts"
q          = "ipod"
fl         = "id,name"
qt         = "select"
fq         = ""
rows       = "10"
url        = 'http://' + host + ':' + port + '/solr/' + collection 

solr       = pysolr.Solr(url, search_handler="/"+qt, timeout=5)
results    = solr.search(q, **{
    'fl': fl,
    'fq': fq,
    'rows': rows
})

print("Number of hits: {0}".format(len(results)))
for i in results:
  pprint.pprint(i)
$./do-search-pysolr.py

Number of hits: 3
{u'id': u'IW-02', 
 u'name': [u'iPod & iPod Mini USB 2.0 Cable']}
{u'id': u'F8V7067-APL-KIT',
 u'name': [u'Belkin Mobile Power Cord for iPod w/ Dock']}
{u'id': u'MA147LL/A', 
 u'name': [u'Apple 60 GB iPod with Video Playback Black']}

This second form is more appropiate if you have a cluster with Apache Zookeeper, or for some other methods such add, delete or update.

# For SolrCloud mode
zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")
solr      = pysolr.SolrCloud(zookeeper, collection)

This is tested in SOLR 6.6 with SOLRCloud.

Links:

More Blog Entries

Using the editorial marker field in SOLR elevation component

One of the nice features of SOLR is the ability for doing promoted or recommended searches.  ...

Using schemaless mode and post command in SOLR

In SOLR 6.6 the data_driven_scheme_configs configset is able to implement the features of...

0 Comments