Apache Solr Release Notes
Introduction
------------
Solr is the popular, blazing fast open source enterprise search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, and rich document (e.g., Word, PDF) handling. Solr is highly
scalable, providing distributed search and index replication, and it powers the
search and navigation features of many of the world's largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within
a servlet container such as Tomcat. Solr uses the Lucene Java search library at
its core for full-text indexing and search, and has REST-like HTTP/XML and JSON
APIs that make it easy to use from virtually any programming language. Solr's
powerful external configuration allows it to be tailored to almost any type of
application without Java coding, and it has an extensive plugin architecture
when more advanced customization is required.
See README.txt and http://lucene.apache.org/solr for more information
on how to get started.
================== 3.3.0 ==================
Upgrading from Solr 3.2.0
----------------------
* SolrCore's CloseHook API has been changed in a backward-incompatible way. It
has been changed from an interface to an abstract class. Any custom
components which use the SolrCore.addCloseHook method will need to
be modified accordingly. To migrate, put your old CloseHook#close impl into
CloseHook#preClose.
New Features
----------------------
* SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete)
component, offering an order of magnitude smaller memory consumption
compared to ternary trees and jaspell and very fast lookups at runtime.
(Dawid Weiss)
* SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position
history for each token, so you can follow the token through all analysis stages.
The output contains a separate int[] attribute containing all positions from
previous Tokenizers/TokenFilters (called "positionHistory").
(Uwe Schindler)
* SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field collapsing
using the Lucene grouping contrib. The search result can be grouped by field and query.
(Martijn van Groningen, Emmanuel Keller, Shalin Shekhar Mangar, Koji Sekiguchi,
Iván de Prado, Ryan McKinley, Marc Sturlese, Peter Karich, Bojan Smid,
Charles Hornberger, Dieter Grad, Dmitry Lihachev, Doug Steigerwald,
Karsten Sperling, Michael Gundlach, Oleg Gnatovskiy, Thomas Traeger,
Harish Agarwal, yonik, Michael McCandless, Bill Bell)
* SOLR-1331: Added a srcCore parameter to CoreAdminHandler's mergeindexes action
to merge one or more cores' indexes to a target core (shalin)
* SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
Optimizations
----------------------
* SOLR-2567: Solr now defaults to TieredMergePolicy. See http://s.apache.org/merging
for more information. (rmuir)
Bug Fixes
----------------------
* SOLR-2519: Improve text_* fieldTypes in example schema.xml: improve
cross-language defaults for text_general; break out separate
English-specific fieldTypes (Jan Høydahl, hossman, Robert Muir,
yonik, Mike McCandless)
* SOLR-2462: Fix extremely high memory usage problems with spellcheck.collate.
Separately, an additional spellcheck.maxCollationEvaluations (default=10000)
parameter is added to avoid excessive CPU time in extreme cases (e.g. long
queries with many misspelled words). (James Dyer via rmuir)
Other Changes
----------------------
* SOLR-2620: Removed unnecessary log4j jar from clustering contrib (Dawid Weiss).
* SOLR-2571: Add a commented out example of the spellchecker's thresholdTokenFrequency
parameter to the example solrconfig.xml, and also add a unit test for this feature.
(James Dyer via rmuir)
* SOLR-2576: Deprecate SpellingResult.add(Token token, int docFreq), please use
SpellingResult.addFrequency(Token token, int docFreq) instead.
(James Dyer via rmuir)
* SOLR-2574: Upgrade slf4j to v1.6.1 (shalin)
* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
users of the generate-maven-artifacts target no longer have to manually
place this jar in the Ant classpath. NOTE: when Ant looks for the
maven-ant-tasks jar, it looks first in its pre-existing classpath, so
any copies it finds will be used instead of the copy included in the
Lucene/Solr source tree. For this reason, it is recommeded to remove
any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
* SOLR-2611: Fix typos in the example configuration (Eric Pugh via rmuir)
================== 3.2.0 ==================
Versions of Major Components
---------------------
Apache Tika 0.8
Carrot2 3.5.0
Upgrading from Solr 3.1
----------------------
* The updateRequestProcessorChain for a RequestHandler is now defined
with update.chain rather than update.processor. The latter still works,
but has been deprecated.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-2496: Add ability to specify overwrite and commitWithin as request
parameters (e.g. specified in the URL) when using the JSON update format,
and added a simplified format for specifying multiple documents.
Example: [{"id":"doc1"},{"id":"doc2"}]
(yonik)
* SOLR-2113: Add TermQParserPlugin, registered as "term". This is useful
when generating filter queries from terms returned from field faceting or
the terms component. Example: fq={!term f=weight}1.5 (hossman, yonik)
* SOLR-1915: DebugComponent now supports using a NamedList to model
Explanation objects in it's responses instead of
Explanation.toString (hossman)
Optimizations
----------------------
Bug Fixes
----------------------
* SOLR-2445: Change the default qt to blank in form.jsp, because there is no "standard"
request handler unless you have it in your solrconfig.xml explicitly. (koji)
* SOLR-2455: Prevent double submit of forms in admin interface.
(Jeffrey Chang via uschindler)
* SOLR-2464: Fix potential slowness in QueryValueSource (the query() function) when
the query is very sparse and may not match any documents in a segment. (yonik)
* SOLR-2469: When using java replication with replicateAfter=startup, the first
commit point on server startup is never removed. (yonik)
* SOLR-2466: SolrJ's CommonsHttpSolrServer would retry requests on failure, regardless
of the configured maxRetries, due to HttpClient having it's own retry mechanism
by default. The retryCount of HttpClient is now set to 0, and SolrJ does
the retry. (yonik)
* SOLR-2409: edismax parser - treat the text of a fielded query as a literal if the
fieldname does not exist. For example Mission: Impossible should not search on
the "Mission" field unless it's a valid field in the schema. (Ryan McKinley, yonik)
* SOLR-2403: facet.sort=index reported incorrect results for distributed search
in a number of scenarios when facet.mincount>0. This patch also adds some
performance/algorithmic improvements when (facet.sort=count && facet.mincount=1
&& facet.limit=-1) and when (facet.sort=index && facet.mincount>0) (yonik)
* SOLR-2333: The "rename" core admin action does not persist the new name to solr.xml
(Rasmus Hahn, Paul R. Brown via Mark Miller)
* SOLR-2390: Performance of usePhraseHighlighter is terrible on very large Documents,
regardless of hl.maxDocCharsToAnalyze. (Mark Miller)
* SOLR-2474: The helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase
did not clear all attributes so they displayed incorrect attribute values for tokens
in later filter stages. (uschindler, rmuir, yonik)
* SOLR-2467: Fix initialization so any errors
are logged properly. (hossman)
* SOLR-2493: SolrQueryParser was fixed to not parse the SolrConfig DOM tree on each
instantiation which is a huge slowdown. (Stephane Bailliez via uschindler)
* SOLR-2495: The JSON parser could hang on corrupted input and could fail
to detect numbers that were too large to fit in a long. (yonik)
* SOLR-2520: Make JSON response format escape \u2029 as well as \u2028
in strings since those characters are not valid in javascript strings
(although they are valid in JSON strings). (yonik)
* SOLR-2536: Add ReloadCacheRequestHandler to fix ExternalFileField bug (if reopenReaders
set to true and no index segments have been changed, commit cannot trigger reload
external file). (koji)
* SOLR-2539: VectorValueSource.floatVal incorrectly used byteVal on sub-sources.
(Tom Liu via yonik)
* SOLR-2554: RandomSortField didn't work when used in a function query. (yonik)
Other Changes
----------------------
* SOLR-2061: Pull base tests out into a new Solr Test Framework module,
and publish binary, javadoc, and source test-framework jars.
(Drew Farris, Robert Muir, Steve Rowe)
* SOLR-2105: Rename RequestHandler param 'update.processor' to 'update.chain'.
(Jan Høydahl via Mark Miller)
* SOLR-2485: Deprecate BaseResponseWriter, GenericBinaryResponseWriter, and
GenericTextResponseWriter. These classes will be removed in 4.0. (ryan)
* SOLR-2451: Enhance assertJQ to allow individual tests to specify the
tolerance delta used in numeric equalities. This allows for slight
variance in asserting score comparisons in unit tests.
(David Smiley, Chris Hostetter)
* SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
because html encoding confuses non-ascii users. (koji)
Build
----------------------
* LUCENE-3006: Building javadocs will fail on warnings by default. Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
Documentation
----------------------
================== 3.1.0 ==================
Versions of Major Components
---------------------
Apache Lucene 3.1.0
Apache Tika 0.8
Carrot2 3.4.2
Velocity 1.6.1 and Velocity Tools 2.0-beta3
Apache UIMA 2.3.1-SNAPSHOT
Upgrading from Solr 1.4
----------------------
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format.
* The Solr JavaBin format has changed as of Solr 3.1. If you are using the
JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034)
* The experimental ALIAS command has been removed (SOLR-1637)
* Using solr.xml is recommended for single cores also (SOLR-1621)
* Old syntax of configuration in solrconfig.xml
is deprecated (SOLR-1696)
* The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory were removed. To strip HTML tags,
HTMLStripCharFilter should be used instead, and it works with any
Tokenizer of your choice. (SOLR-1657)
* Field compression is no longer supported. Fields that were formerly
compressed will be uncompressed as index segments are merged. For
shorter fields, this may actually be an improvement, as the compression
used was not very good for short text. Some indexes may get larger though.
* SOLR-1845: The TermsComponent response format was changed so that the
"terms" container is a map instead of a named list. This affects
response formats like JSON, but not XML. (yonik)
* SOLR-1876: All Analyzers and TokenStreams are now final to enforce
the decorator pattern. (rmuir, uschindler)
* LUCENE-2608: Added the ability to specify the accuracy on a per request basis.
It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker
methods using the new SpellingOptions class, but are not required to. While this change is
backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers)
* readercycle script was removed. (SOLR-2046)
* In previous releases, sorting or evaluating function queries on
fields that were "multiValued" (either by explicit declaration in
schema.xml or by implict behavior because the "version" attribute on
the schema was less then 1.2) did not generally work, but it would
sometimes silently act as if it succeeded and order the docs
arbitrarily. Solr will now fail on any attempt to sort, or apply a
function to, multi-valued fields
* The DataImportHandler jars are no longer included in the solr
WAR and should be added in Solr's lib directory, or referenced
via the directive in solrconfig.xml.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-1302: Added several new distance based functions, including
Great Circle (haversine), Manhattan, Euclidean and String (using the
StringDistance methods in the Lucene spellchecker).
Also added geohash(), deg() and rad() convenience functions.
See http://wiki.apache.org/solr/FunctionQuery. (gsingers)
* SOLR-1553: New dismax parser implementation (accessible as "edismax")
that supports full lucene syntax, improved reserved char escaping,
fielded queries, improved proximity boosting, and improved stopword
handling. Note: status is experimental for now. (yonik)
* SOLR-1574: Add many new functions from java Math (e.g. sin, cos) (yonik)
* SOLR-1569: Allow functions to take in literal strings by modifying the
FunctionQParser and adding LiteralValueSource (gsingers)
* SOLR-1571: Added unicode collation support though Lucene's CollationKeyFilter
(Robert Muir via shalin)
* SOLR-785: Distributed Search support for SpellCheckComponent
(Matthew Woytowitz, shalin)
* SOLR-1625: Add regexp support for TermsComponent (Uri Boness via noble)
* SOLR-1297: Add sort by Function capability (gsingers, yonik)
* SOLR-1139: Add TermsComponent Query and Response Support in SolrJ (Matt Weber via shalin)
* SOLR-1177: Distributed Search support for TermsComponent (Matt Weber via shalin)
* SOLR-1621, SOLR-1722: Allow current single core deployments to be specified by solr.xml (Mark Miller , noble)
* SOLR-1532: Allow StreamingUpdateSolrServer to use a provided HttpClient (Gabriele Renzi via shalin)
* SOLR-1653: Add PatternReplaceCharFilter (koji)
* SOLR-1131: FieldTypes can now output multiple Fields per Type and still be searched. This can be handy for hiding the details of a particular
implementation such as in the spatial case. (Chris Mattmann, shalin, noble, gsingers, yonik)
* SOLR-1586: Add support for Geohash and Spatial Tile FieldType (Chris Mattmann, gsingers)
* SOLR-1697: PluginInfo should load plugins w/o class attribute also (noble)
* SOLR-1268: Incorporate FastVectorHighlighter (koji)
* SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access
to info currently available from registry.jsp and stats.jsp
(ehatcher, hossman)
* SOLR-1815: SolrJ now preserves the order of facet queries. (yonik)
* SOLR-1677: Add support for choosing the Lucene Version for Lucene components within
Solr. (Uwe Schindler, Mark Miller)
* SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage.
(Alex Baranov via yonik)
* SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory
and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms.
Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the
performance of SnowballPorterFilterFactory. (rmuir)
* SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr
TokenFilters now support custom Attributes, and some have improved performance:
especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler)
* SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator"
parameters for controlling the minimum shingle size produced by the filter, and
the separator string that it uses, respectively. (Steven Rowe via rmuir)
* SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles"
parameter, to output unigrams if the number of input tokens is fewer than
minShingleSize, and no shingles can be generated.
(Chris Harris via Steven Rowe)
* SOLR-1923: PhoneticFilterFactory now has support for the
Caverphone algorithm. (rmuir)
* SOLR-1957: The VelocityResponseWriter contrib moved to core.
Example search UI now available at http://localhost:8983/solr/browse
(ehatcher)
* SOLR-1974: Add LimitTokenCountFilterFactory. (koji)
* SOLR-1966: QueryElevationComponent can now return just the included results in the elevation file (gsingers, yonik)
* SOLR-1556: TermVectorComponent now supports per field overrides. Also, it now throws an error
if passed in fields do not exist and warnings
if fields that do not have term vector options (termVectors, offsets, positions)
that align with the schema declaration. It also
will now return warnings about (gsingers)
* SOLR-1985: FastVectorHighlighter: add wrapper class for Lucene's SingleFragListBuilder (koji)
* SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory. (PB via rmuir)
* SOLR-397: Date Faceting now supports a "facet.date.include" param
for specifying when the upper & lower end points of computed date
ranges should be included in the range. Legal values are: "all",
"lower", "upper", "edge", and "outer". For backwards compatibility
the default value is the set: [lower,upper,edge], so that al ranges
between start and ed are inclusive of their endpoints, but the
"before" and "after" ranges are not.
* SOLR-945: JSON update handler that accepts add, delete, commit
commands in JSON format. (Ryan McKinley, yonik)
* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
autoGeneratePhraseQueries="true" (the default) causes the query parser to
generate phrase queries if multiple tokens are generated from a single
non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11
will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
delimited languages. (yonik)
* SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents
in CSV format. (Chris Mattmann, yonik)
* SOLR-1240: "Range Faceting" has been added. This is a generalization
of the existing "Date Faceting" logic so that it now supports any
all stock numeric field types that support range queries in addition
to dates. facet.date is now deprecated in favor of this generalized mechanism.
(Gijs Kunze, hossman)
* SOLR-2021: Add SolrEncoder plugin to Highlighter. (koji)
* SOLR-2030: Make FastVectorHighlighter use of SolrEncoder. (koji)
* SOLR-2053: Add support for custom comparators in Solr spellchecker, per LUCENE-2479 (gsingers)
* SOLR-2049: Add hl.multiValuedSeparatorChar for FastVectorHighlighter, per LUCENE-2603. (koji)
* SOLR-2059: Add "types" attribute to WordDelimiterFilterFactory, which
allows you to customize how WordDelimiterFilter tokenizes text with
a configuration file. (Peter Karich, rmuir)
* SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit.
(Brandon Evans via koji)
* SOLR-1316: Create autosuggest component.
(Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab)
* SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See
http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial.
Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers)
* SOLR-2128: Full parameter substitution for function queries.
Example: q=add($v1,$v2)&v1=mul(popularity,5)&v2=20.0
(yonik)
* SOLR-2133: Function query parser can now parse multiple comma separated
value sources. It also now fails if there is extra unexpected text
after parsing the functions, instead of silently ignoring it.
This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4 (yonik)
* SOLR-2157: Suggester should return alpha-sorted results when onlyMorePopular=false (ab)
* SOLR-2010: Added ability to verify that spell checking collations have
actual results in the index. (James Dyer via gsingers)
* SOLR-2188: Added "maxTokenLength" argument to the factories for ClassicTokenizer,
StandardTokenizer, and UAX29URLEmailTokenizer. (Steven Rowe)
* SOLR-2129: Added a Solr module for dynamic metadata extraction/indexing with Apache UIMA.
See contrib/uima/README.txt for more information. (Tommaso Teofili via rmuir)
* SOLR-2325: Allow tagging and exlcusion of main query for faceting. (yonik)
* SOLR-2263: Add ability for RawResponseWriter to stream binary files as well as
text files. (Eric Pugh via yonik)
* SOLR-860: Add debug output for MoreLikeThis. (koji)
* SOLR-1057: Add PathHierarchyTokenizerFactory. (ryan, koji)
Optimizations
----------------------
* SOLR-1679: Don't build up string messages in SolrCore.execute unless they
are necessary for the current log level.
(Fuad Efendi and hossman)
* SOLR-1874: Optimize PatternReplaceFilter for better performance. (rmuir, uschindler)
* SOLR-1968: speed up initial filter cache population for facet.method=enum and
also big terms for multi-valued facet.method=fc. The resulting speedup
for the first facet request is anywhere from 30% to 32x, depending on how many
terms are in the field and how many documents match per term. (yonik)
* SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for
multi-valued fields) when facet.limit is both high, and a high enough
percentage of the number of unique terms in the field. Extreme cases
yield speedups over 3x. (yonik)
* SOLR-2046: add common functions to scripts-util. (koji)
Bug Fixes
----------------------
* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble)
* SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate
to the original ValueSource.getValues(reader) so custom sources
will work. (yonik)
* SOLR-1572: FastLRUCache correctly implemented the LRU policy only
for the first 2B accesses. (yonik)
* SOLR-1582: copyField was ignored for BinaryField types (gsingers)
* SOLR-1563: Binary fields, including trie-based numeric fields, caused null
pointer exceptions in the luke request handler. (yonik)
* SOLR-1577: The example solrconfig.xml defaulted to a solr data dir
relative to the current working directory, even if a different solr home
was being used. The new behavior changes the default to a zero length
string, which is treated the same as if no dataDir had been specified,
hence the "data" directory under the solr home will be used. (yonik)
* SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added
fl=score to the parameter list instead of appending score to the
existing field list. (yonik)
* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always
uses Lucene default. (Lance Norskog via Mark Miller)
* SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs
(i.e. code points outside of the BMP), resulting in incorrect
matching. This change requires reindexing for any content with
such characters. (Robert Muir, yonik)
* SOLR-1596: A rollback operation followed by the shutdown of Solr
or the close of a core resulted in a warning:
"SEVERE: SolrIndexWriter was not closed prior to finalize()" although
there were no other consequences. (yonik)
* SOLR-1595: StreamingUpdateSolrServer used the platform default character
set when streaming updates, rather than using UTF-8 as the HTTP headers
indicated, leading to an encoding mismatch. (hossman, yonik)
* SOLR-1587: A distributed search request with fl=score, didn't match
the behavior of a non-distributed request since it only returned
the id,score fields instead of all fields in addition to score. (yonik)
* SOLR-1601: Schema browser does not indicate presence of charFilter. (koji)
* SOLR-1615: Backslash escaping did not work in quoted strings
for local param arguments. (Wojtek Piaseczny, yonik)
* SOLR-1628: log contains incorrect number of adds and deletes.
(Thijs Vonk via yonik)
* SOLR-343: Date faceting now respects facet.mincount limiting
(Uri Boness, Raiko Eckstein via hossman)
* SOLR-1624: Highlighter only highlights values from the first field value
in a multivalued field when term positions (term vectors) are stored.
(Chris Harris via yonik)
* SOLR-1635: Fixed error message when numeric values can't be parsed by
DOMUtils - notably for plugin init params in solrconfig.xml.
(hossman)
* SOLR-1651: Fixed Incorrect dataimport handler package name in SolrResourceLoader
(Akshay Ukey via shalin)
* SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption
(Robert Muir via shalin)
* SOLR-1667: PatternTokenizer does not reset attributes such as positionIncrementGap
(Robert Muir via shalin)
* SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that
could halt the streaming of documents. The original patch to fix this
(never officially released) introduced another hanging bug due to
connections not being released.
(Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik)
* SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers
retrieved from ContentStreams are not closed in various places, resulting
in file descriptor leaks.
(Christoff Brill, Mark Miller)
* SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search
(Janne Majaranta via koji)
* SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble)
* SOLR-1579: Fixes to XML escaping in stats.jsp
(David Bowen and hossman)
* SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can
result in incorrectly sorted results. (yonik)
* SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every
commit. (yonik)
* SOLR-1823: Fixed XMLResponseWriter (via XMLWriter) so it no longer throws
a ClassCastException when a Map containing a non-String key is used.
(Frank Wesemann, hossman)
* SOLR-1797: fix ConcurrentModificationException and potential memory
leaks in ResourceLoader. (yonik)
* SOLR-1850: change KeepWordFilter so a new word set is not created for
each instance (John Wang via yonik)
* SOLR-1706: fixed WordDelimiterFilter for certain combinations of options
where it would output incorrect tokens. (Robert Muir, Chris Male)
* SOLR-1936: The JSON response format needed to escape unicode code point
U+2028 - 'LINE SEPARATOR' (Robert Hofstra, yonik)
* SOLR-1914: Change the JSON response format to output float/double
values of NaN,Infinity,-Infinity as strings. (yonik)
* SOLR-1948: PatternTokenizerFactory should use parent's args (koji)
* SOLR-1870: Indexing documents using the 'javabin' format no longer
fails with a ClassCastException whenSolrInputDocuments contain field
values which are Collections or other classes that implement
Iterable. (noble, hossman)
* SOLR-1981: Solr will now fail correctly if solr.xml attempts to
specify multiple cores that have the same name (hossman)
* SOLR-1791: Fix messed up core names on admin gui (yonik via koji)
* SOLR-1995: Change date format from "hour in am/pm" to "hour in day"
in CoreContainer and SnapShooter. (Hayato Ito, koji)
* SOLR-2008: avoid possible RejectedExecutionException w/autoCommit
by making SolreCore close the UpdateHandler before closing the
SearchExecutor. (NarasimhaRaju, hossman)
* SOLR-2036: Avoid expensive fieldCache ram estimation for the
admin stats page. (yonik)
* SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji)
* SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers)
* SOLR-2100: The replication handler backup command didn't save the commit
point and hence could fail when a newer commit caused the older commit point
to be removed before it was finished being copied. This did not affect
normal master/slave replication. (Peter Sturge via yonik)
* SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers)
* SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers)
* SOLR-2111: Change exception handling in distributed faceting to work more
like non-distributed faceting, change facet_counts/exception from a String
to a List to enable listing all exceptions that happened, and
prevent an exception in one facet command from affecting another
facet command. (yonik)
* SOLR-2110: Remove the restriction on names for local params
substitution/dereferencing. Properly encode local params in
distributed faceting. (yonik)
* SOLR-2135: Fix behavior of ConcurrentLRUCache when asking for
getLatestAccessedItems(0) or getOldestAccessedItems(0).
(David Smiley via hossman)
* SOLR-2148: Highlighter doesn't support q.alt. (koji)
* SOLR-2180: It was possible for EmbeddedSolrServer to leave searchers
open if a request threw an exception. (yonik)
* SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab)
* SOLR-2081: BaseResponseWriter.isStreamingDocs causes
SingleResponseWriter.end to be called 2x
(Chris A. Mattmann via hossman)
* SOLR-2219: The init() method of every SolrRequestHandler was being
called twice. (ambikeshwar singh and hossman)
* SOLR-2285: duplicate SolrEventListeners no longer created (hossman)
* SOLR-1993: fix String cast assumption in JavaBinCodec - specific
addresses "commitWithin" option on Update requests.
(noble, hossman, and Maxim Valyanskiy)
* SOLR-2261: fix velocity template layout.vm that referred to an older
version of jquery. (Eric Pugh via rmuir)
* SOLR-2307: fix bug in PHPSerializedResponseWriter (wt=phps) when
dealing with SolrDocumentList objects -- ie: sharded queries.
(Antonio Verni via hossman)
* SOLR-2127: Fixed serialization of default core and indentation of solr.xml when serializing.
(Ephraim Ofir, Mark Miller)
* SOLR-2320: Fixed ReplicationHandler detail reporting for masters
(hossman)
* SOLR-482: Provide more exception handling in CSVLoader (gsingers)
* SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception.
(Julien Coloos, hossman, yonik)
* SOLR-2085: Improve SolrJ behavior when FacetComponent comes before
QueryComponent (Tomas Salfischberger via hossman)
* SOLR-1940: Fix SolrDispatchFilter behavior when Content-Type is
unknown (Lance Norskog and hossman)
* SOLR-1983: snappuller fails when modifiedConfFiles is not empty and
full copy of index is needed. (Alexander Kanarsky via yonik)
* SOLR-2156: SnapPuller fails to clean Old Index Directories on Full Copy
(Jayendra Patil via yonik)
* SOLR-96: Fix XML parsing in XMLUpdateRequestHandler and
DocumentAnalysisRequestHandler to respect charset from XML file and only
use HTTP header's "Content-Type" as a "hint". (uschindler)
* SOLR-2339: Fix sorting to explicitly generate an error if you
attempt to sort on a multiValued field. (hossman)
* SOLR-2348: Fix field types to explicitly generate an error if you
attempt to get a ValueSource for a multiValued field. (hossman)
* SOLR-2380: Distributed faceting could miss values when facet.sort=index
and when facet.offset was greater than 0. (yonik)
* SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader
are fixed to be resolved using the URI standard (RFC 2396). The system
identifier is no longer a plain filename with path, it gets initialized
using a custom URI scheme "solrres:". This scheme is resolved using a
EntityResolver that utilizes ResourceLoader
(org.apache.solr.common.util.SystemIdResolver). This makes all relative
pathes in Solr's config files behave like expected. This change
introduces some backwards breaks in the API: Some config classes
(Config, SolrConfig, IndexSchema) were changed to take
org.xml.sax.InputSource instead of InputStream. There may also be some
backwards breaks in existing config files, it is recommended to check
your config files / XSLTs and replace all XIncludes/HREFs that were
hacked to use absolute paths to use relative ones. (uschindler)
* SOLR-309: Fix FieldType so setting an analyzer on a FieldType that
doesn't expect it will generate an error. Practically speaking this
means that Solr will now correctly generate an error on
initialization if the schema.xml contains an analyzer configuration
for a fieldType that does not use TextField. (hossman)
* SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not
thread safe and could throw an exception. (yonik)
Other Changes
----------------------
* SOLR-1602: Refactor SOLR package structure to include o.a.solr.response
and move QueryResponseWriters in there
(Chris A. Mattmann, ryan, hoss)
* SOLR-1516: Addition of an abstract BaseResponseWriter class to simplify the
development of QueryResponseWriter implementations.
(Chris A. Mattmann via noble)
* SOLR-1592: Refactor XMLWriter startTag to allow arbitrary attributes to be written
(Chris A. Mattmann via noble)
* SOLR-1561: Added Lucene 2.9.1 spatial contrib jar to lib. (gsingers)
* SOLR-1570: Log warnings if uniqueKey is multi-valued or not stored (hossman, shalin)
* SOLR-1558: QueryElevationComponent only works if the uniqueKey field is
implemented using StrField. In previous versions of Solr no warning or
error would be generated if you attempted to use QueryElevationComponent,
it would just fail in unexpected ways. This has been changed so that it
will fail with a clear error message on initialization. (hossman)
* SOLR-1611: Added Lucene 2.9.1 collation contrib jar to lib (shalin)
* SOLR-1608: Extract base class from TestDistributedSearch to make
it easy to write test cases for other distributed components. (shalin)
* Upgraded to Lucene 2.9-dev r888785 (shalin)
* SOLR-1610: Generify SolrCache (Jason Rutherglen via shalin)
* SOLR-1637: Remove ALIAS command
* SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning
in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin)
* SOLR-1674: Improve analysis tests and cut over to new TokenStream API.
(Robert Muir via Mark Miller)
* SOLR-1661: Remove adminCore from CoreContainer . removed deprecated methods setAdminCore(), getAdminCore() (noble)
* SOLR-1704: Google collections moved from clustering to core (noble)
* SOLR-1268: Add Lucene 2.9-dev r888785 FastVectorHighlighter contrib jar to lib. (koji)
* SOLR-1538: Reordering of object allocations in ConcurrentLRUCache to eliminate
(an extremely small) potential for deadlock.
(gabriele renzi via hossman)
* SOLR-1588: Removed some very old dead code.
(Chris A. Mattmann via hossman)
* SOLR-1696 : Deprecate old syntax and move configuration to HighlightComponent (noble)
* SOLR-1727: SolrEventListener should extend NamedListInitializedPlugin (noble)
* SOLR-1771: Improved error message when StringIndex cannot be initialized
for a function query (hossman)
* SOLR-1695: Improved error messages when adding a document that does not
contain exactly one value for the uniqueKey field (hossman)
* SOLR-1776: DismaxQParser and ExtendedDismaxQParser now use the schema.xml
"defaultSearchField" as the default value for the "qf" param instead of failing
with an error when "qf" is not specified. (hossman)
* SOLR-1851: luceneAutoCommit no longer has any effect - it has been remove (Mark Miller)
* SOLR-1865: SolrResourceLoader.getLines ignores Byte Order Markers (BOMs) at the
beginning of input files, these are often created by editors such as Windows
Notepad. (rmuir, hossman)
* SOLR-1938: ElisionFilterFactory will use a default set of French contractions
if you do not supply a custom articles file. (rmuir)
* SOLR-2003: SolrResourceLoader will report any encoding errors, rather than
silently using replacement characters for invalid inputs (blargy via rmuir)
* SOLR-1804: Google collections updated to Google Guava (which is a superset of collections and contains bug fixes) (gsingers)
* SOLR-2034: Switch to JavaBin codec version 2. Strings are now serialized
as the number of UTF-8 bytes, followed by the bytes in UTF-8. Previously
Strings were serialized as the number of UTF-16 chars, followed by the
bytes in Modified UTF-8. (hossman, yonik, rmuir)
* SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory.
(Steven Rowe via koji)
* SOLR-2213: Upgrade to jQuery 1.4.3 (Erick Erickson via ryan)
* SOLR-1826: Add unit tests for highlighting with termOffsets=true
and overlapping tokens. (Stefan Oestreicher via rmuir)
* SOLR-2340: Add version infos to message in JavaBinCodec when throwing
exception. (koji)
* SOLR-2350: Since Solr no longer requires XML files to be in UTF-8
(see SOLR-96) SimplePostTool (aka: post.jar) has been improved to
work with files of any mime-type or charset. (hossman)
* SOLR-2365: Move DIH jars out of solr.war (David Smiley via yonik)
* SOLR-2381: Include a patched version of Jetty (6.1.26 + JETTY-1340)
to fix problematic UTF-8 handling for supplementary characters.
(Bernd Fehling, uschindler, yonik, rmuir)
* SOLR-2391: The preferred Content-Type for XML was changed to
application/xml. XMLResponseWriter now only delivers using this
type; updating documents and analyzing documents is still supported
using text/xml as Content-Type, too. If you have clients that are
hardcoded on text/xml as Content-Type, you have to change them.
(uschindler, rmuir)
* SOLR-2414: All ResponseWriters now use only ServletOutputStreams
and wrap their own Writer around it when serializing. This fixes
the bug in PHPSerializedResponseWriter that produced wrong string
length if the servlet container had a broken UTF-8 encoding that was
in fact CESU-8 (see SOLR-1091). The system property to enable the
CESU-8 byte counting in PHPSerializesResponseWriters for broken
servlet containers was therefore removed and is now ignored if set.
Output is always UTF-8. (uschindler, yonik, rmuir)
Build
----------------------
* SOLR-1522: Automated release signing process. (gsingers)
* SOLR-1891: Make lucene-jars-to-solr fail if copying any of the jars fails, and
update clean to remove the jars in that directory (Mark Miller)
* LUCENE-2466: Commons-Codec was upgraded from 1.3 to 1.4. (rmuir)
* SOLR-2042: Fixed some Maven deps (Drew Farris via gsingers)
* LUCENE-2657: Switch from using Maven POM templates to full POMs when
generating Maven artifacts (Steven Rowe)
Documentation
----------------------
* SOLR-1590: Javadoc for XMLWriter#startTag
(Chris A. Mattmann via hossman)
* SOLR-1792: Documented peculiar behavior of TestHarness.LocalRequestFactory
(hossman)
================== Release 1.4.1 ==================
Release Date: See http://lucene.apache.org/solr for the official release date.
Upgrading from Solr 1.4
-----------------------
This is a bug fix release - no changes are required when upgrading from Solr 1.4.
However, a reindex is needed for some of the analysis fixes to take effect.
Versions of Major Components
----------------------------
Apache Lucene 2.9.3
Apache Tika 0.4
Carrot2 3.1.0
Lucene Information
----------------
Since Solr is built on top of Lucene, many people add customizations to Solr
that are dependent on Lucene. Please see http://lucene.apache.org/java/2_9_3/,
especially http://lucene.apache.org/java/2_9_3/changes/Changes.html for more
information on the version of Lucene used in Solr.
Bug Fixes
----------------------
* SOLR-1934: Upgrade to Apache Lucene 2.9.3 to obtain several bug
fixes from the previous 2.9.1. See the Lucene 2.9.3 release notes
for details. (hossman, Mark Miller)
* SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate
to the original ValueSource.getValues(reader) so custom sources
will work. (yonik)
* SOLR-1572: FastLRUCache correctly implemented the LRU policy only
for the first 2B accesses. (yonik)
* SOLR-1595: StreamingUpdateSolrServer used the platform default character
set when streaming updates, rather than using UTF-8 as the HTTP headers
indicated, leading to an encoding mismatch. (hossman, yonik)
* SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption
(Robert Muir via shalin)
* SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning
in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin)
* SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that
could halt the streaming of documents. The original patch to fix this
(never officially released) introduced another hanging bug due to
connections not being released. (Attila Babo, Erik Hetzner via yonik)
* SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers
retrieved from ContentStreams are not closed in various places, resulting
in file descriptor leaks.
(Christoff Brill, Mark Miller)
* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always
uses Lucene default. (Lance Norskog via Mark Miller)
* SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can
result in incorrectly sorted results. (yonik)
* SOLR-1797: fix ConcurrentModificationException and potential memory
leaks in ResourceLoader. (yonik)
* SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every
commit. (yonik)
* SOLR-1522: Show proper message if