Using schemaless mode and post command in SOLR

Cesar Capillas

Schemaless config set in SOLR

In SOLR 6.6 the data_driven_scheme_configs configset is able to implement the features of the so-named Schemaless mode. This mode is a set of features that allow users to construct an effective schema by simply indexing sample data, without having to manually edit the schema. In the following examples, I’m using SOLR Cloud 6.6 setup and collection API.

$ cd /opt/solr6/solr-6.6.0/

$ ./bin/solr create -c gettingstarted -shards 1 -replicationFactor 2 -p 8983 -d server/solr/configsets/data_driven_schema_configs

We can check the default fields of the gettingstarted collection using curl and jq parser for json response.

$ curl -s http://localhost:8983/solr/gettingstarted/schema/fields | jq '.fields' | jq '.[] .name'

"_root_"
"_text_"
"_version_"
"id"

If I add a CSV sample data in gettingstarted collection for example:

$ curl -s "http://localhost:8983/solr/gettingstarted/update?commit=true" -H "Content-type:application/csv" -d '
id,Artist,Album,Released,Rating,FromDistributor,Sold
44C,Old Shews,Mead for Walking,1988-08-13,0.01,14,0'

I can then check the new schema fields added:

$ curl -s http://localhost:8983/solr/gettingstarted/schema/fields | jq '.fields' | jq '.[] .name'

"Album"
"Artist"
"FromDistributor"
"Rating"
"Released"
"Sold"
"_root_"
"_text_"
"_version_"
"id"

which means that the CSV file is being indexed, and the new fields added. Let’s do search for it:

$ curl -s "http://localhost:8983/solr/gettingstarted/select?q=id:44C&wt=json" | jq ".response.docs"

[
  {
    "id": "44C",
    "Artist": [
      "Old Shews"
    ],
    "Album": [
      "Mead for Walking"
    ],
    "Released": [
      "1988-08-13T00:00:00Z"
    ],
    "Rating": [
      0.01
    ],
    "FromDistributor": [
      14
    ],
    "Sold": [
      0
    ],
    "_version_": 1595939281803673600
  }
]

But the automatic schemaless config is not always possible or so easy. Let’s take a films collection too from SOLR examples. Depending on the first values of indexed data, SOLR detects the data field types.

$ cd /opt/solr6/solr-6.6.0/

$ ./bin/solr create -c films -shards 1 -replicationFactor 2 -p 8983 -d server/solr/configsets/data_driven_schema_config

$ curl -s http://localhost:8383/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
    "add-field" : {
        "name":"name",
        "type":"text_general",
        "multiValued":false,
        "stored":true
    },
    "add-field" : {
        "name":"initial_release_date",
        "type":"pdate",
        "stored":true
    }
}'

$ ./bin/post -c films -p 8983 example/films/films.json

In this case, if we did not add previously the corresponding schema, it won’t be possible to add films.json data.
Let’s populate another collection with books data via JSON:

$ cd /opt/solr6/solr-6.6.0/

$ ./bin/solr create -c books -shards 1 -replicationFactor 2 -p 8983 -d server/solr/configsets/data_driven_schema_configs

$ ./bin/post -p 8983 -c books ./example/exampledocs/books.json

which it would be equivalent to:

$ curl -s 'http://localhost:8983/solr/books/update?commit=true' --data-binary @example/exampledocs/books.json -H 'Content-type:application/json'

Finally, in exampledocs collection we can add data via XML too:

$ cd /opt/solr6/solr-6.6.0/

$ ./bin/solr create -c exampledocs -shards 1 -replicationFactor 2 -p 8983 -d server/solr/configsets/data_driven_schema_configs

$ ./bin/post -c exampledocs -p 8983 ./example/exampledocs/*.xml

As you can see in the previous examples, it is possible to add CSV, JSON or XML data, and it also supports indexing PDF or Word files. Another examples of post commands may be:

$ ./bin/post -c gettingstarted *.csv
$ ./bin/post -c gettingstarted *.xml
$ ./bin/post -c gettingstarted *.json
$ ./bin/post -c gettingstarted -params "separator=%09" -type text/csv data.tsv
$ ./bin/post -c gettingstarted sample.doc
$ ./bin/post -u solr:SolrRocks -c gettingstarted a.pdf
$ ./bin/post -c gettingstarted -filetypes doc,pdf samplefolder/
$ ./bin/post -c gettingstarted -d '<delete><id>23</id></delete>'

Links:

Si te ha parecido interesante comparte este post en RRS

Leer más sobre temas relacionados

liferay

Cómo mantener el portal del cliente actualizado

Las preferencias de los clientes han cambiado drásticamente. Solo en los últimos años, los clientes se han tenido que adaptar a una pandemia

15 de julio de 2024 No hay comentarios

liferay

Unifica la Experiencia de Usuario con un portal de clientes

La mayoría de las empresas tienen múltiples sistemas implementados para ayudar a atender a sus clientes. Analizamos cómo superar este desafío y qué estrategias se necesitan para brindar una experiencia de cliente unificada.

4 de julio de 2024 No hay comentarios

CAPSUL-IA investiga estrategias para universalizar el uso de la IA en la Industria

CAPSUL-IA. Encapsulación de Soluciones basadas en la IA para acelerar su adopción.

El Departamento de I+D+i de ZYLK ha empezado a trabajar en el proyecto CAPSUL-IA, cuyo objetivo es investigar nuevas soluciones que permitan facilitar y agilizar

22 de abril de 2024 No hay comentarios

tech

El Empoderamiento de la Mujer en la Era Digital

La presente década está siendo protagonizada por el proceso de digitalización y la aparición de tecnologías disruptivas que prometen poner patas arriba el mundo tal

6 de marzo de 2024 No hay comentarios

liferay

Liferay y ChatGPT: La Fusión de la Experiencia Digital y la Inteligencia Artificial

En un mundo donde la comunicación digital desempeña un papel fundamental en la experiencia del cliente, la integración de tecnologías avanzadas como el procesamiento del

1 de marzo de 2024 No hay comentarios

G-SMART 5.0, respaldado por el programa Hazitek de SPRI y liderado por el Grupo Gestamp busca impulsar la Smart Factory en la industria vasca

innovación / i+d

GSMART 5.0 Avanzando hacia la Smart Factory. Innovación Tecnológica en el sector Industrial del País Vasco

En la actualidad industrial, el desarrollo tecnológico ha creado un entorno marcado por la competencia entre regiones tecnológicamente avanzadas y una constante incertidumbre en la

30 de diciembre de 2023 No hay comentarios

Deja un comentario Cancelar respuesta

Busca por categorías