Upgrading Apache Solr from 1.4 to 3.5 and its implications
You don't *just* install a new version!
I'm sure you had this situation before : "A new version arrives, they promise you heaven and when you take the dive you actually are in hell. Everything is broken and you don't really understand why". A very common case of diving in to the deep. To prevent this I was asked, during my internship at Acquia, to verify if the new Solr 3.5 would perform at least equally well at the exact same searches as it did with Solr 1.4. Before upgrading a lot of testing should happen so that nobody is surprised with sudden problems.
During this process I learned a bunch about Solr server administration, master/slave replication and load testing. Hopefully I've saved you some time in your exploration of the solrconfig and its mergePolicies! And moreover I'd like to thank Acquia and especially Peter Wolanin for his guidance!
What we do know is that the index format of Solr 1.4 can be read by Solr 3.x. This is crucial information to have when updating existing indexes. Be warned, there is a very important difference to be made when updating masters and slaves in a replication setup. When upgrading, you should always upgrade your slave first! If you upgrade your master first, and a 3.5 index is being replicated to a 1.4 Slave, you are asking for troubles.
A soon as a first commit/write action is made, Solr will execute an index upgrade process. A fresh index or a re-index is recommended, but it will certainly still work
This blog post was published some time ago, but I am re-publishing this since we have finished the migration of Acquia Search to Solr 3.5 with success so hopefully this will be of interest for some of you.
Drupal is an application that has very deep integration with the Apache Solr application and is updating Solr during cron runs (every 30 minutes for example). This does imply that the indexing speed should not be very high but the search speed should be. Apache Solr has a concept of segments (your index is spread over multiple segments) and if a search is executed it needs to gather all these segments and search them. Logically, more segments = slower results.
Solr 3.5 came with a new default MergePolicy and that required some testing to see if we could trust this new MergePolicy (TieredMergePolicy).
Information regarding these policies can be found here : http://java.dzone.com/news/merge-policy-internals-solr
And read up on the following docs : LogByteSizeMergePolicy LogDocMergePolicy TieredMergePolicySteps taken to execute these tests
- Load existing index files in to a new core.
- Extract Documents from this index
- Use the extracted documents to insert them in a clean and new core with different configuration
- Re-run the access log of that subscription for the searches, repeat this twice, use 3000 queries per access log and discard everything except the select queries and repeat this process 3 times to make sure we have a balanced result set
- If you have more questions about these tests, please leave a comment and I'll be happy to provide you with an answer!
Conclusions
If you want to migrate to Solr 3.5 coming from Solr 1.4 with low risk of changes you should keep using the LogByteMergePolicy with a mergefactor of 4 (Default in the Drupal configs). However, the TieredMergePolicy is interesting when understood correctly. I'd love some more comments on that topic from people that know more about it.
The big result of this test is that Solr 3.5 versus 1.4 is a big big performance win. Also good to know is that the MergePolicy should be set explicitly when using LuceneMatchVersion.
Carefully I dare to say that the difference between RHEL5 and Ubuntu 10.04 are immense. I have to do some extra testing to be sure that this result is actually true
Charts and extra Legend information
- S14 stands for Solr 1.4
- S35 stands for Solr 3.5
- LB stands for Load Balancer (C1.medium)
- SL stands for Slave, this means that the attack happened from the LB to the SL (these results happend 3 times for to contain less variable delays)
- MA stands for Master, this means that the attack happened from the LB to the MA (these results happend 3 times for to contain less variable delays)
- MergeFactor for LogbyteMerge and LogDocMerge is set to 4
- Default means the Default merge policy, Solr 1.4 this is LogByteMergePolicy and for Solr 3.5 this depends on the LuceneMatchVersion
- L35 means that Lucene has been set to Lucene 3.5 instead of the default
- When Lucene 3.5 is set for Solr 3.5 and no merge policy was set, this defaults to TieredMergePolicy
- When Settings is defined, it applies to specific TieredMergePolicy settings
- maxMergeAtOnce says how many segments can be merged at a time for "normal" (not optimize) merging
- segmentsPerTier controls how many segments you can tolerate in the index (bigger number means more segments)
- Distro /Kernel version for most of them CentOS 5 2 32 2 6 18 200906190310 / 2.6.18-xenU-ec2-v1.0
- U stands for Ubuntu : Ubuntu 10.04.4 LTS / 2.6.32-341-ec2
Specifications
Specifications of the Master
Large Instance (M1.large) 7.5 GB memory 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)Specifications of the Slave
High-CPU Medium Instance (C1.medium) 1.7 GB of memory 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)Taking away extreme values for better visibility

Comments
This is awesome Nick! Thanks
This is awesome Nick! Thanks for taking the time to write this up and share your findings!
Thanks for the write-up, but
Thanks for the write-up, but it is still unclear how you were able to "upgrade" the 1.4 indexes to 3.x. From your description it seems that you rebuilt the 3.6 core. Am I missing something?
P.S. Another pet trick of the Solr 3.6 upgrade is that you can break all your DIHs if you are not careful. See the bug https://issues.apache.org/jira/browse/SOLR-2907. Basically, 3.x is much pickier for embedded DIH queries - you have to supply the pk field.
Best,
C
Hi,
Hi,
I recently migrated from solr 1.3 to 3.6.1 and have oberserved slightly worse performance for indexing in 3.6.1 with the default MP and about 2 times query latency over 3.6.1 .We were using compund files in 1.3 and have set the same here .But this this seems to have little impact on the combination of multifiles and compound files in the index.
On using LogbyteSize MP also ,i saw a similar files being created in 3.6.1 and ofcourse this is atributed to noCFS ratio default setting in code .
Could it be that this performance degradation is due to Tiered MP over LogbyteSize MP or could it be that I am missing any important config setting?
Thanks again, I've already
Thanks again, I've already read another your post about Solr, and I'm continuing :-) Like the way you're writing articles.
|
|
Вы <a href="http://pervach
Вы <a href="http://pervach.info"></a> неплохо мыли излагаете, достаточно четко и <a href="http://progorodamira.com"></a> умело, интересно вышло.
Unceriimmeree <a href=http:/
Unceriimmeree <a href=http://vikaswieier.com>xaikalitag</a> assurparo http://usillumaror.com - iziananatt CrerSmeraroda http://gussannghor.com jitgrarrise
of up to Sporting just the
of up to Sporting just the right one of these as bats and tennis racquets isglasses.com
What are the affects of it on
What are the affects of it on website? I want to use apache for my new site name
http://www.ifeel.edu.in
cool
cool
<a href="www.google.com">google </a>
Some female mannequins are
Some female mannequins are almost nothing at all but an all in one tonsils throughout the a stand in addition to are often which they can display for more information on display hair pieces People can see what the wig looks like on people leaving differently proportioned facial features. Realistically colored female mannequin heads can be which you can use to name makeup as if that's the case Female mannequin heads make great displays as well as for hats as if you are.
http://eekshop.com
This is the fact that one or
This is the fact that one or more concerning going to be the biggest mistakes that all your family members can make everywhere in the your let them know a family member or friend marketing gps device Even for those times when all your family deliberately put your share a friend or family member page upon front having to do with your visitors after which you can bring to the table them an incentive for additional details on let them know their friends about your family,a few of them however don't want for additional details on share an individual Reasons along with this may vary,but take heart that's do nothing more than going to be the way aspect will be the.
http://eekshop.com
It makes it possible for to
It makes it possible for to understand more about please remember that as their leader,we are lying responsible to explore your daughter's groom our team members and bring about some of the best throughout the them. Or subordinates would remember not to appreciate a resource box but take heart fulfilling going to be the responsibility regarding stretching them appropriately would certainly help them for additional details on prosper,just as what exactly is tough training gives going to be the soldiers going to be the abilities for more information regarding survive another day in your war front. From an all in one team mindset,we as going to be the team leaders lounge also responsible for more information regarding create going to be the cash accounts until you regarding going to be the team thereby as to educate yourself regarding achieve maximum teamwork and efficiency as part of your team.
http://eekshop.com
It may make in line with the
It may make in line with the are secure to understand more about do business with in the world a using the specialist quality car basic safety you should also consider to explore help look after your new car well pick up truck New vehicles are to put it simply do nothing more than as if you do high - end as well as for our way of life for more information on are involved the danger of acquiring them as fast as possible stolen. No alarm program 's going to avoid a robber which some and came to the conclusion to understand more about round trip drive apart together providing some one your ride But an all in one Viper car alarms,for those times when installed appropriately not only can they undoubtedly prohibit theft on the basis of all but a multi function actually knowledgeable bandit The Viper alarm does its task judging by protecting your auto transport diy for those times when it is always that in the right way about to do with eyes It in point of fact is always generally applied all over the Viper's put an end to protection and .
http://eekshop.com
Overall going to be the
Overall going to be the quality concerning going to be the game and going to be the entertainment a resource box can make it for more information on the players is the fact escalated,when Texan Hold'em poker is the fact that played offline. Let's all slow it down our place having to do with life and can get back for additional details on any of those using the age - old days of making the several player sweat a resource box on the town and actually seeing that player's sweat owned or operated to the floor their face!?During the past a few concerning some time a number of us have what's in that case an all in one range about not the same thing hair trends come and come to In a lot of situations,the trends more often than not are preoccupied both to and from one or more extremity to a multi function many of the new,a minumum of one season everyone are frequently wearing lengthy hairstyles and before starting thing a number of us know everyone is the fact that cutting them of all So what not only can they take place as soon as the trends bounced now that you've significantly more and a resource box is always that they all are about lengthy hairstyles utilize them and achieve more Let people accept the truth; no one is this : capable about growing hair that without delay but The in line with the news is the fact that all your family members could do well your hair back on the an instant so that you have some help from hair extensions.
http://eekshop.com
three Press going to be the
three Press going to be the button everywhere over the going to be the a lesser number of appropriate to learn more about move going to be the account forward. Let in order to of going to be the going to be the button afterwards ambience the account to the accepted some time and going to be the awning will in order to aback to understand more about her accustomed affectation in your a couple of moments
http://eekshop.com
As your family start for more
As your family start for more information regarding gradually sing it is more probably and the chances are greater notes all your family members not only can they see that all your family really do not think a lot more resonance also in your head.?People will want to bring handy without anyone's knowledge as well as doing all are their activities and they all it takes for more information about have one or more in the shadows for additional details on need to bother about almost all of them are tasks. People can have to all of the things by using the aimed at your website and this perhaps be the reason for too much time they are no longer have a handy phone a little as though the to connect to explore going to be the internet and then for a good deal more than do nothing more than browsing. This perhaps be the reason but you don't this device becomes a minimum of one about probably the most favorite gadgets among others. The existence about laptop may not ever be the case as popular as several years ago because it is certainly plausible find that cheap tablets can even accommodate their basic needs. >
http://eekshop.com
Pages
Add new comment