In this blog post I will show how to get a distributed search cluster up and running using LucidWorks Enterprise 2.1 (LWE) and SolrCloud. LucidWorks Enterprise 2.1 is the most recent release as of this post.
I want to start with a few definitions of the terms I will be using in this article. The documentation for SolrCloud is confusing and seems to have multiple definitions for the same term. For example, the SolrCloud wiki page defines a shard as a partition of the index and the NewSolrCloudDesign wiki page seems to refer to it as a replica.
For the purpose of this article we will use the following definitions:
- collection: a search index composed of the total set of documents being searched.
- shard: a partition of a collection. A shard contains a subset of the documents being searched, a collection is composed of one or more shards.
- replica: a copy of a collection. If a collection is composed of N shards, a single replica means each of those shards will have one copy.
- node: a single instance of LucidWorks Enterprise or Solr. A single node can contain multiple collections where each collection has a different data source.
The basic requirements for this test setup are:
- Start with a single node
- Create an index with two shards and one replica
- Index some documents that should be split between the two shards
- Bring a new node online and move one of the shards and replicas to this new node
Step 1: Installation
Download and install LucidWorks Enterprise. For this first node, use all the default settings provided by the installer. Installation instructions can be found on Lucid Imagination’s official documentation.
Step 2: Verify Installation and Stop LWE
If you used all the default settings and options during installation of LucidWorks Enterprise, you should have access to the following:
- LWE Admin: http://localhost:8989/ (user=admin, password=admin)
- Solr Admin: http://localhost:8888/solr
After verifying you have a working installation, stop LWE. To do this, browser to your installation directory and run:
Step 3: Bootstrap Zookeeper
Running LucidWorks Enterprise and Solr in a distributed mode requires the use of Apache Zookeeper. Lucid Imagination’s documentation recommends running a separate Zookeeper ensemble for production deployments. That is outside the scope of this article, so we will use Solr’s embedded version of Zookeeper that is intended for development purposes only.
Since this is the first time we are running Zookeeper, we need to “bootstrap” it with LWE’s configuration files. To do this start LWE with the bootstrap flags:
$LWE/app/bin/start.sh -lwe_core_java_opts "-Dbootstrap_conf=true -DzkRun"
This bootstrap process only needs to be done once. In the future, you can start LWE with the zkRun flag:
$LWE/app/bin/start.sh -lwe_core_java_opts "-DzkRun"
Once bootstrapped, you can head over to the Solr cloud admin page (http://localhost:8888/solr/#/cloud) and see if the default LWE configs were uploaded to Zookeeper. Verify that you have configs called collection1 and LucidWorksLogs.
Step 4: Create A Test Collection
To keep things simple, we are going to use LucidWorks Enterprise’s “collection1″ configuration. This is the out of the box schema and solrconfig settings that ship with LucidWorks Enterprise. In most situations you will need to create a schema specifically for the content you are indexing but this default configuration is fine for our test collection.
According to Lucid Imagination’s documentation, it is not yet possible to create a collection containing multiple shards via their admin interface or REST api. Due to this limitation, we will need to do things manually by using Solr’s Core Admin API.
Update 05/05/12: Mark Miller pointed out that this is a lapse in the LucidWorks Enterprise documentation. You can specify the numShards parameter via the LucidWorks REST api or if using the UI, it will honor the numShards system property. This is nice but does not simplify the steps of this post, see comments below.
I wish I could say this is as easy as executing an api call specifying that we would like to create a new collection with two shards and one replica, but I can’t. In it’s current form everything related to creating shards and replicas needs to be done manually.
The SolrCloud documentation mentions the use of the numShards parameter which I assumed would be used to automatically split new collections. In my testing this was not the case, all it does it create a new Zookeeper entry for a second shard but you still need to manually create a Solr core for that shard using the Core Admin API.
So, now that we know we need to do everything manually, execute the four core admin api calls to create a single collection. The four api calls are:
Create the first primary shard:
Create a replica of the first shard:
Create the second primary shard:
Create a replica of the second shard:
Alright, now that we have the collection created time to check that everything was successful. Head back to the Solr Cloud interface (http://localhost:8888/solr/#/cloud) and view the clusterstate.json entry.
In this json output, you should see the new “testcollection” collection and that it is composed of two shards, “shard1″ and “shard2″. Expanding those shards will show our replicas, “replica1″ and “replica2″ for each shard.
You may be wondering why we are looking at the clusterstate.json file over the nice LWE Admin interface. Well, that is because when you create collections manually via Solr’s Core Admin API they do not show up in LWE. This is bug that I hope is addressed in a future version of LucidWorks Enterprise.
Update 05/05/12: If you start LWE with the numShards parameter and use the GUI/Rest API to create the initial collection, it will show up in the UI.
Step 5: Index Data
I had intended on using the example crawler that ships with LucidWorks Enterprise, but Lucid Imagination states that data sources do not work when running LWE in SolrCloud mode. That and the fact I can’t see my collection in the LWE admin interface in order to assign a data source to my test collection.
So, for quick testing purposes I will resort to using the sample documents that ship with a standard download of Apache Solr. Once you download Solr, browse to the
Edit the file post.sh to point to our test collection update handler.
Save the file and run:
Now that we have data fed, lets check that it was distributed between the two shards we created and that our replicas contain the same data. Head back to to Solr admin page at http://localhost:8888/solr/#/.
- click on the first shard “testcollection_shard1_replica1″, you should have 10 documents in this shard
- click on the second shard “testcollection_shard2_replica1″, you should have 11 documents in this shard
- check the replicas for each shard, they should have the same counts
At this point, we can start issues some queries against the collection:
Get all documents in the collection:
Get all documents in the collection belonging to shard1:
Get all documents in the collection belonging to shard2:
Step 6: Add A New Node
Now time for the fun part, adding a new node. We want to create a new node and have the shards and replicas be split between the two nodes. This is going to be yet another manual process because SolrCloud and LucidWorks Enterprise do not automatically rebalance the shards as new nodes come and go.
To keep things simple, we going to run multiple instances of LWE on the same machine. So, run the LWE installer again but this time do not use the defaults. Select a new installation directory (I will refer to this as $LWE2 below), use port 7777 for Solr, and 7878 for LWE UI. Uncheck the box that starts LWE automatically.
Now we need to start our new instance of LucidWorks Enterprise and connect to our existing Zookeeper instance. To do this you need to set the zkHost parameter to the host and port of your existing Zookeeper instance. Unfortunately, Lucid’s documentation does not specify what port Zookeeper is running on. However, on the SolrCloud wiki page, I found that Zookeeper starts on Solr Port + 1000. In our case Zookeeper should be running on port 9888. Run the following command to start the new instance of LWE:
$LWE2/app/bin/start.sh -lwe_core_java_opts "-DzkHost=localhost:9888"
Now execute the two Solr Core Admin API calls to create our shard and replica on this new node since they are not automatically migrated from the first server.
Create a new replica of shard1
Create a new replica of shard2
At this point, take a look at the cluster state like we did at the end of step 4 above. You should still see our two shards, but each shard should now have three replicas. Two on the first node and one of the new node.
Also take a look at the new node’s admin interface at http://localhost:7777/solr/#/. If you look at the core status for our new shards you should see that our documents were automatically sent over from the first node. Finally something I did not need to do manually!!!
Issuing the same queries from step 5 above against the new node should yeild the same results.
Step 7: Delete A Replica
Now that we have a new node in the cluster we can kill the extra shard replicas we had created on the first node. Issue the following Solr Core Admin API commands:
Unload and delete shard1 replica
Unload and delete shard2 replica
Take a look at the cluster state again and observe that we have finally achieved our desired outcome, a single collection with two shards and a replica.
As you can see it is possible to get LucidWorks Enterprise up and running with SolrCloud but it is not a trivial process. Hopefully future versions of LWE will make this process easier and address some of the bugs I mentioned above. At his point SolrCloud feels half-baked and it’s integration into LucidWorks Enterprise even less. Considering all the LWE features that do not work when running in SolrCloud mode, you would probably be better off running a nightly version of Solr 4.0 which will have the latest SolrCloud patches.