Integrating Solr Search

Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.

In CM1 Solr is ignored unless a configuration file is placed into the server. The location for this file is <CM1_Install>\rxconfig\DeliveryServer\solr-servers.xml

The XSL Schema for the same looks as under:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 2 U (http://www.xmlspy.com) by Ben Chen (Percussion Software) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="SolrConfig">
<xs:annotation>
<xs:documentation>Comment describing your root element</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="SolrServer" type="SolrServer" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="SolrServer">
<xs:complexContent>
<xs:extension base="PSAbstractDataObject">
<xs:sequence>
<xs:element name="serverType" type="xs:string"
minOccurs="0" maxOccurs="1" />
<xs:element name="solrHost" type="xs:string" />
<xs:element name="defaultCollection" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="saslContextName" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="maxErrors" type="xs:boolean" minOccurs="0" maxOccurs="1"/>
<xs:element name="cleanAllOnFullPublish" type="xs:boolean" minOccurs="0" maxOccurs="1"/>
<xs:element name="metadataMap">
<xs:complexType>
<xs:sequence>
<xs:element name="entry"
type="SolrMetaMapEntry"
minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="enabledSites">
<xs:complexType>
<xs:sequence>
<xs:element name="site"
type="xs:string"
minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="SolrMetaMapEntry">

<xs:attribute name="key" type="xs:string"></xs:attribute>
<xs:attribute name="value" type="xs:string"></xs:attribute>
</xs:complexType>
<xs:complexType name="PSAbstractDataObject"/>

<xs:complexType name="SiteRootEntry">

<xs:attribute name="siteName" type="xs:string"></xs:attribute>
<xs:attribute name="sitePrefix" type="xs:string"></xs:attribute>
</xs:complexType>
</xs:schema>

All Configuration Options

The following configuration is required for CM1 Solr.

SolrConfig:  Multiple SolrConfig sections can be created.  The first section matching the server-type and a mapped site will be used for solr.

server-type:  PRODUCTION, or STAGING  allows for a separate configuration for staging and production published.

solrHost:  hostname and port of Apache Zookeeper instance or http url of httpcore.  If the value does not start with http then Apache Zookeeper is assumed.

defaultCollection:  Used for Apache Zookeeper connection.  Selects the collection required to be used for CM1.

saslContextName: Only used for Apache Zookeeper installation and currently untested.  If set SASL login configuration can be used with this context name.  Would be configured in <CM1_Install>\AppServer\server\rx\conf\login-config.xml.  Without this anonymous Apache Zookeeper connection is required.

maxErrors: How many connection or other solr errors before the server will give up trying to update for further published items.

cleanAllOnFullPublish:  If set to true a full publish will send a request in the publish transaction to delete all existing content before adding.  This can be used to ensure that no orphaned date is left in the solr collection.  

metadataMap: Provided the ability to remap percussion delivered metadata keys into solr index keys.

<metadataMap>

<entry key="fromKey" value="toKey"/>

</metadataMap>

siteRoots: A list of the sites that this configuration should be used with.  If a site is not listed in any SolrServer section then it will not publish to solr.  Also the sitePrefix is specified.  This configures where solr can find to root folder of the published File and Image assets to extract metadata from the binary content.

<siteRoots>
  <entry siteName="Tester" sitePrefix="C:\DevEnv\Installs\dev\Deployment\Server\Testerapps\ROOT"/>
</siteRoots>

Solr Cloud (Apache Zookeper Configuration)

The following configuration is an example that will publish metadata to a Cloud (Zookeeper) SOLR instance for the site TESTER that is publishing to the default DTS location for this site.

When connecting to a Zookeeper instance specify the solrHost field as {host}:{port} and also specify a defaultCollection

<SolrConfig>
<SolrServer>
<solrHost>localhost:9983</solrHost>
<defaultCollection>gettingstarted</defaultCollection>
<cleanAllOnFullPublish>true</cleanAllOnFullPublish>
<metadataMap>
</metadataMap>
<enabledSites>
<site>Tester</site>
</enabledSites>
</SolrServer>
</SolrConfig>

Running a OOB solr instance that works with above configuration

Download solr-5.2.1 http://archive.apache.org/dist/lucene/solr/5.2.1/

Extract the archive. On a command prompt change directory to bin directory within extracted folder. Make sure JAVA_HOME environment variable is set is set to a current JDK.

Run the following

solr -e cloud

Follow prompts and select all default options. 

Example startup.

==========================================

To begin, how many Solr nodes would you like to run in your local cluster (specify 1-4 nodes) [2]:
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Creating Solr home "E:\Users\stephenbolton\Downloads\solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node1\solr"
1 file(s) copied.
1 file(s) copied.
Cloning "E:\Users\stephenbolton\Downloads\solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node1" into "E:\Users\stephenbolton\Downloads\solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node2"
2 File(s) copied
Please enter the port for node1 [8983]:
node1 port: 8983

Starting node1 on port 8983 using command:
solr -cloud -p 8983 -s example\node1\solr

Waiting for 3 seconds, press a key to continue ...
Please enter the port for node2 [7574]:
node2 port: 7574

Starting node2 on port 7574 using command:
solr -cloud -p 7574 -s example\node2\solr -z localhost:9983

Waiting for 6 seconds, press a key to continue ...

Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
gettingstarted

How many shards would you like to split gettingstarted into? [2]
2

How many replicas per shard would you like to create? [2]
2

Please choose a configuration for the gettingstarted collection, available options are: basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs]
data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983
Uploading E:\Users\stephenbolton\Downloads\solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\server\solr\configsets\data_driven_schema_configs\conf for config gettingstarted to ZooKeeper at localhost:9983

Creating new collection 'gettingstarted' using command:
http://10.10.9.161:8983/solr/admin/collections?action=CREATE&name=gettingstarted&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=gettingstarted

{
"responseHeader":{
"status":0,
"QTime":7843},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":7559},
"core":"gettingstarted_shard2_replica1"}}}


SolrCloud example is running, please visit http://localhost:8983/solr"

==========================

Browse to  http://localhost:8983/solr to check instance is running with configuration file in place. Publish configured "Tester" site.

Go to http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query

Click on Execute Query button to return all results.  This should show metadata for all published content.

stop solr

solr stop -all

remove cloud configuration by deleting example/cloud folder.  Solr Cloud can then be reconfigured with solr -e cloud.

Solr http configuration

The following configuration is an example that will publish metadata to a single SOLR instance for the site TESTER that is publishing to the default DTS location for this site.

<SolrConfig>
<SolrServer>
<solrHost>http://localhost:8983/solr/gettingstarted</solrHost>
<cleanAllOnFullPublish>false</cleanAllOnFullPublish>
<metadataMap>
</metadataMap>

<enabledSites>
<site>Tester</site>
</enabledSites>


</SolrServer>
</SolrConfig>

Solr HTTP example setup

Download solr-5.2.1 http://archive.apache.org/dist/lucene/solr/5.2.1/

Extract the archive. On a command prompt change directory to bin directory within extracted folder. Make sure JAVA_HOME environment variable is set is set to  a current JDK.

Run the following to setup an example system

solr -e schemaless

Follow prompts and select all default options.

Log in to http://localhost:8983/solr to see running server with configuration file in place.  Publish configured "Tester" site.

Go to 

http://localhost:8983/solr/#/gettingstarted/query

Click on Execute Query button to return all results.  This should show metadata for all published content.

Solr Commands

Stop server

solr stop --all

or 

solr stop -p 8983

Restart server

server start - p 8983

cleanup

delete example/schemaless folder and rerun

solr -e schemaless

Leave a comment

*
*