Integrating Solr Search

Solr Search engine feature is supported starting CM1 version 5.3.0.  We request customer running older version than this to upgrade to latest version before attempt to integrate Solr search engine.

Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.

In CM1 the Solr indexes are generated when the content is published by the publisher engine.  The indexes are created and sent to Solr server at the same time it sends the content to the DTS after extracting the metadata from the page during publishing.

Solr acts like a database  to which CM1 pushes the content of each item which is the page itself as a "document" .  The  Solr treats the body as text for full text search,  along with it is a series of metadata fields which is extracted in the same way CM1 does  for DTS publishing so the content can be searched on those metadata fields independently than just what is in the body of the document. 

In CM1 Solr is ignored unless a configuration file is placed into the server. The location for this file is <CM1_Install>\rxconfig\DeliveryServer\solr-servers.xml

The XSL Schema for the same looks as under:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 2 U (http://www.xmlspy.com) by Ben Chen (Percussion Software) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="SolrConfig">
<xs:annotation>
<xs:documentation>Comment describing your root element</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="SolrServer" type="SolrServer" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="SolrServer">
<xs:complexContent>
<xs:extension base="PSAbstractDataObject">
<xs:sequence>
<xs:element name="serverType" type="xs:string"
minOccurs="0" maxOccurs="1" />
<xs:element name="solrHost" type="xs:string" />
<xs:element name="defaultCollection" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="saslContextName" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="maxErrors" type="xs:boolean" minOccurs="0" maxOccurs="1"/>
<xs:element name="cleanAllOnFullPublish" type="xs:boolean" minOccurs="0" maxOccurs="1"/>
<xs:element name="metadataMap">
<xs:complexType>
<xs:sequence>
<xs:element name="entry"
type="SolrMetaMapEntry"
minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="enabledSites">
<xs:complexType>
<xs:sequence>
<xs:element name="site"
type="xs:string"
minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="SolrMetaMapEntry">

<xs:attribute name="key" type="xs:string"></xs:attribute>
<xs:attribute name="value" type="xs:string"></xs:attribute>
</xs:complexType>
<xs:complexType name="PSAbstractDataObject"/>

<xs:complexType name="SiteRootEntry">

<xs:attribute name="siteName" type="xs:string"></xs:attribute>
<xs:attribute name="sitePrefix" type="xs:string"></xs:attribute>
</xs:complexType>
</xs:schema>

All Configuration Options

The following configuration is required for CM1 Solr.

SolrConfig:  Multiple SolrConfig sections can be created.  The first section matching the serverType and a mapped site will be used for solr.

serverType:  PRODUCTION, or STAGING  allows for a separate configuration for staging and production published.  Default is PRODUCTION.

solrHost:  hostname and port of Apache Zookeeper instance or http url of httpcore.  If the value does not start with http then Apache Zookeeper is assumed.

defaultCollection:  Used for Apache Zookeeper connection.  Selects the collection required to be used for CM1.

saslContextName: Only used for Apache Zookeeper installation and currently untested.  If set SASL login configuration can be used with this context name.  Would be configured in <CM1_Install>\AppServer\server\rx\conf\login-config.xml.  Without this anonymous Apache Zookeeper connection is required.

maxErrors: How many connection or other solr errors before the server will give up trying to update for further published items.

cleanAllOnFullPublish:  If set to true a full publish will send a request in the publish transaction to delete all existing content before adding.  This can be used to ensure that no orphaned date is left in the solr collection.  

metadataMap: Provided the ability to remap percussion delivered metadata keys into solr index keys.

<metadataMap>

<entry key="fromKey" value="toKey"/>

</metadataMap>

enabledSites: A list of the sites that this configuration should be used with.  If a site is not listed in any SolrServer section then it will not publish to solr.  A

<enabledSites>
  <site>siteName</site>
</enabledSites>

Solr Cloud (Apache Zookeper Configuration)

The following configuration is an example that will publish metadata to a Cloud (Zookeeper) Solr instance for the site TESTER that is publishing to the default DTS location for this site. The <cleanAllOnFullPublish>true</cleanAllOnFullPublish> node and the solr -e cloud command is used to chose for Solr cloud publishing.

When connecting to a Zookeeper instance specify the solrHost field as {host}:{port} and also specify a defaultCollection

<SolrConfig>
<SolrServer>
<solrHost>localhost:9983</solrHost>
<defaultCollection>gettingstarted</defaultCollection>
<cleanAllOnFullPublish>true</cleanAllOnFullPublish>
<metadataMap>
</metadataMap>
<enabledSites>
<site>Tester</site>
</enabledSites>
</SolrServer>
</SolrConfig>

Running a OOB solr instance that works with above configuration

Download solr-5.2.1 http://archive.apache.org/dist/lucene/solr/5.2.1/

Extract the archive. On a command prompt change directory to bin directory within extracted folder. Make sure JAVA_HOME environment variable is set is set to a current JDK.

Run the following to start Solr as cloud.  For simple testing of Solr startup run through the wizard with all default choices.

solr -e cloud

Follow prompts and select all default options. 

Example startup.

==========================================

To begin, how many Solr nodes would you like to run in your local cluster (specify 1-4 nodes) [2]:
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Creating Solr home "solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node1\solr"
1 file(s) copied.
1 file(s) copied.
Cloning "solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node1" into "stephenbolton\Downloads\solr-5.1.0 (1)\solr-5.1.0-example\solr-5.1.0-example\example\cloud\node2"
2 File(s) copied
Please enter the port for node1 [8983]:
node1 port: 8983

Starting node1 on port 8983 using command:
solr -cloud -p 8983 -s example\node1\solr

Waiting for 3 seconds, press a key to continue ...
Please enter the port for node2 [7574]:
node2 port: 7574

Starting node2 on port 7574 using command:
solr -cloud -p 7574 -s example\node2\solr -z localhost:9983

Waiting for 6 seconds, press a key to continue ...

Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
gettingstarted

How many shards would you like to split gettingstarted into? [2]
2

How many replicas per shard would you like to create? [2]
2

Please choose a configuration for the gettingstarted collection, available options are: basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs]
data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983
Uploading solr-5.1.0 \solr-5.1.0-example\solr-5.1.0-example\server\solr\configsets\data_driven_schema_configs\conf for config gettingstarted to ZooKeeper at localhost:9983

Creating new collection 'gettingstarted' using command:
http://10.10.9.161:8983/solr/admin/collections?action=CREATE&name=gettingstarted&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=gettingstarted

{
"responseHeader":{
"status":0,
"QTime":7843},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":7559},
"core":"gettingstarted_shard2_replica1"}}}


SolrCloud example is running, please visit http://localhost:8983/solr"

==========================

Browse to  http://localhost:8983/solr to check instance is running with configuration file in place. Publish configured "Tester" site.

Go to http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query

Click on Execute Query button to return all results.  This should show metadata for all published content.

solr stop -all

remove cloud configuration by deleting solr-5.2.1\example\cloud folder.  Solr Cloud can then be reconfigured with solr -e cloud.

Solr http(schemaless) configuration

The following configuration is an example that will publish metadata to a single Solr instance for the site TESTER that is publishing to the default DTS location for this site.

<SolrConfig>
<SolrServer>
<solrHost>localhost:9983</solrHost>
<defaultCollection>gettingstarted</defaultCollection>
<cleanAllOnFullPublish>false</cleanAllOnFullPublish>
<metadataMap>
</metadataMap>
<enabledSites>
<site>Tester</site>
</enabledSites>
</SolrServer>
</SolrConfig>

Solr HTTP(schemaless) example setup

Download solr-5.2.1 http://archive.apache.org/dist/lucene/solr/5.2.1/

Extract the archive. On a command prompt change directory to bin directory within extracted folder. Make sure JAVA_HOME environment variable is set is set to  a current JDK.

Run the following for HTTP Solr  startup. For simple testing of Solr startup run through the wizard with all default choices.

solr -e schemaless

Follow prompts and select all default options.

Log in to http://localhost:8983/solr to see running server with configuration file in place.

 Publish configured "Tester" site.

Go to 

http://localhost:8983/solr/#/gettingstarted/query

Click on Execute Query button to return all results.  This should show metadata for all published content.

Solr Commands

Stop server

solr stop --all

or 

solr stop -p 8983

Restart server

server start - p 8983

cleanup

delete solr-5.2.1\example\schemaless\example\schemaless folder

start Solr serverr 

solr -e cloud 

or

solr -e schemaless