1
0
mirror of https://github.com/djohnlewis/stackdump synced 2025-12-17 05:13:32 +00:00

150 Commits
jython ... v1.2

Author SHA1 Message Date
Sam
722d4125e7 Added section in README re new PowerShell scripts.
Also fixed formatting and wording.
2013-12-01 03:43:58 +11:00
Sam
ce3eb04270 Updated README with v1.2 changes and SO import stats. 2013-12-01 03:33:40 +11:00
Samuel Lai
9613caa8d1 Changed settings so Solr now only listens on localhost, not all interfaces. 2013-11-29 15:18:55 +11:00
Samuel Lai
2583afeb90 Removed more redundant date/time parsing. 2013-11-29 15:11:32 +11:00
Samuel Lai
522e1ff4f2 Fixed bug in script where the directory change was not reverted when script exited. 2013-11-29 15:06:10 +11:00
Samuel Lai
36eb8d3980 Changed the name of the stackdump schema to something better than 'Example'. 2013-11-29 15:05:31 +11:00
Samuel Lai
a597b2e588 Merge import-perf-improvements branch to default. 2013-11-29 13:01:41 +11:00
Samuel Lai
4a9c4504b3 Updated bad docs. 2013-11-29 12:57:06 +11:00
Samuel Lai
77dd2def42 Oops, forgot to re-instate the comment index during the backout. 2013-11-29 01:42:17 +11:00
Samuel Lai
75a216f5a4 Backed out the comments-batching change.
It was causing weird perf issues and errors. Didn't really seem like it made things faster; if anything, things became slower.
2013-11-29 01:12:09 +11:00
Samuel Lai
bf09e36928 Changed other models to avoid unnecessary date/time parsing.
Added PRAGMA statements for comments table and changed flow so the siteId_postId index is now created after data has been inserted.
2013-11-29 00:18:54 +11:00
Samuel Lai
cdb8d96508 Comments are now committed in batches and using a 'prepared' statement via executemany.
Also fixed a Windows compatibility bug with the new temp comments db and a bug with the webapp now that the Comment model has moved. Dates are also no longer parsed from their ISO form for comments; instead left as strings and parsed by SQLObject internally as needed.
2013-11-28 23:51:53 +11:00
Samuel Lai
5868c8e328 Fixed settings for Windows compatibility. 2013-11-28 22:06:33 +11:00
Samuel Lai
8e3d21f817 Fixed settings for Windows compatibility. 2013-11-28 22:06:33 +11:00
Samuel Lai
2fea457b06 Added PowerShell equivalents to launch and manage Stackdump on Windows. 2013-11-28 21:53:45 +11:00
Samuel Lai
6469691e4b Added PowerShell equivalents to launch and manage Stackdump on Windows. 2013-11-28 21:53:45 +11:00
Samuel Lai
65394ac516 More minor fixes. Really should get Stackdump set-up on my dev machine. 2013-11-28 15:07:05 +11:00
Samuel Lai
bcf1d7c71a Again. Forgot to fix site->siteId rename. 2013-11-28 14:39:25 +11:00
Samuel Lai
d36146ae46 More bugs - forgot to rename uses when renaming Comment.site to siteId 2013-11-28 14:38:21 +11:00
Samuel Lai
e1272ce58a Oops, bug with closing temp_db file handle. 2013-11-28 14:35:24 +11:00
Samuel Lai
bff7e13d83 Comment data used during importing is now stored in a separate database to make it easier to delete them afterwards. 2013-11-28 14:23:55 +11:00
Samuel Lai
c0766de8d4 Skips valid XML character scrubbing if configured for faster performance. 2013-11-28 14:01:00 +11:00
Samuel Lai
644269dd5d Added PyCharm project files to the ignore list. 2013-11-28 13:54:47 +11:00
Sam
6bbf0d7b28 Removed a big duplicate file in Solr. 2013-10-22 23:36:46 +11:00
Sam
71c875437e Added tag v1.1 for changeset 3ad1ff15b528 2013-10-22 23:21:20 +11:00
Sam
e78e70e5ac Updated README for v1.1. 2013-10-22 23:20:45 +11:00
Sam
77c76ea9d4 Grrr, forgot to add a file to the previous changeset.
This adds the template that is used when bad Solr syntax is encountered.
2013-10-22 23:20:23 +11:00
Sam
7dc7b7b5bd Solr syntax errors are now handled nicely.
Fixes #4.
2013-10-22 22:43:57 +11:00
Sam
645b24f370 Question permalinks are now recognised and internally linked.
Fixes #3.
2013-10-22 22:03:09 +11:00
Sam
f067353f62 Added answer permalinks and ability to rewrite internal answer permalinks.
This includes a new field in the Solr schema, so a re-index is required after this changeset.

Fixes #1
2013-10-22 21:59:49 +11:00
Sam
4e924f6bd8 Removed the extracted solr-webapp files from the repository.
The files are automatically extracted on launch from the war file.
2013-10-22 20:56:42 +11:00
Sam
bc5642af71 Removed the Solr log file.
Luckily there was nothing sensitive in there... I don't think.
2013-10-22 20:49:19 +11:00
Sam
09edf16128 Added missing rule to render external links in comments differently. 2013-10-22 08:45:07 +11:00
Sam
436b86b234 Upgrade Apache Solr to 4.5.0 and PySolr to 3.1.0.
All Solr indexes will need to be re-created.
2013-10-22 08:44:37 +11:00
Sam
e88e6a31a0 Added a comment about configuring SELinux to allow JRE 7 to run. 2013-10-14 07:44:10 +11:00
Sam
706fd5ef9d Fixed a bug where site names with non-ASCII characters caused a crash. 2013-10-14 07:32:45 +11:00
Sam
9cac41929b Added quotes in a bunch of places so things work with paths containing spaces. 2013-10-14 07:31:50 +11:00
Sam
3615a72310 Removed the -server arg for starting Solr.
This causes issues with JVMs that don't have the 'server' VM built-in,
e.g. the Windows i386 one. The JVM automatically selects the server
VM anyway when the host characteristics are enough anyway.
2013-10-14 07:24:10 +11:00
Sam
a472517736 Refactored the import_site command; now caters for filenames of different case (the case changed in 2013 dumps). 2013-09-24 18:07:55 +10:00
Samuel Lai
70fa72b04e Added new external components to README. 2012-12-15 22:53:21 +11:00
Samuel Lai
b667ea0165 Added Mathjax support for math.stackexchange.com.
Equations and expressions are only rendered in question view at the moment.
2012-12-15 22:47:46 +11:00
Samuel Lai
993bee4fc1 Added markdown parsing for comments so links in comments now appear properly.
Also rewrote part of the HTML rewriting code so it doesn't introduce an additional wrapping element in the output which was added due to a html5lib requirements on input.
2012-12-15 21:43:06 +11:00
Samuel Lai
5ac8492f38 Improved the README file with details on disk space requirements and configuration. 2012-08-25 17:05:13 +10:00
Samuel Lai
36a605711e Added StackExchange question and user URLs to pages as a tooltip to comply with attribution requirements.
Attribution requirements actually state that hyperlinked URLs should be used, but they would be rather useless in this app so this is an alternative.
2012-08-25 16:57:01 +10:00
Samuel Lai
e4b2ee80a0 Fixed a bug where extra html/head/body tags were added for every question and answer due to the HTML re-writing process. 2012-08-25 16:37:03 +10:00
Samuel Lai
af28d3e403 Added in a setting to control the number of random questions to show on the search pages. 2012-08-25 16:20:57 +10:00
Samuel Lai
c240356a7b Added a check and a nicer message for when the management commands can't connect to solr. 2012-08-24 18:48:17 +10:00
Samuel Lai
fb38b02758 Added tag v1.0 for changeset 3684617407bb 2012-08-24 18:17:56 +10:00
Samuel Lai
96b1e49311 Added some missing libraries/projects to the credits list in the README. 2012-08-19 13:33:56 +10:00
Samuel Lai
1f9546e4b9 Made some minor amendments to the instructions in the README. 2012-08-19 12:53:42 +10:00
Samuel Lai
049e857159 Handled another exception that may occur if no data has been imported. 2012-08-19 12:47:42 +10:00
Samuel Lai
16e5530a82 Modified download_site_info script to create the data directory if it doesn't exist. 2012-08-19 12:30:33 +10:00
Samuel Lai
c1ae870e3d Startup scripts now create the data directory if it doesn't exist. 2012-08-19 12:27:45 +10:00
Samuel Lai
651f97255e More rendering fixes to README. 2012-08-19 12:15:35 +10:00
Samuel Lai
527d5deb05 Fixed some minor bugs with README and it being rendered by bitbucket. 2012-08-19 12:13:06 +10:00
Samuel Lai
1e6718d850 Merged the cpython-only branch into the default branch.
The cPython will be the default version; not really much need for the Jython version anymore.
2012-08-19 11:49:38 +10:00
Samuel Lai
bffe0fd8f5 Added a README. 2012-08-19 11:22:41 +10:00
Samuel Lai
d5bd74feae Changed the default user for the init.d scripts to an arbitrary 'stackdump' user. 2012-08-19 09:50:09 +10:00
Samuel Lai
1b27784a8c Added an error page for when Stackdump fails to connect to Solr.
Also unified the error pages and added a generic 500 error page.
2012-08-19 00:09:35 +10:00
Samuel Lai
e0c96a5c5f Fixed a minor styling issue with question titles on search result pages. 2012-08-18 23:38:15 +10:00
Samuel Lai
f25f25019c Created init.d scripts for the Solr and web apps. Compatible with RHEL5. 2012-08-18 23:36:13 +10:00
Samuel Lai
01b0dcae39 Fixed a minor CSS spacing issue between the 'show more comments' block and the moderation message. 2012-08-18 21:40:29 +10:00
Samuel Lai
c1a5382622 Modified app to use a settings file.
This allows users to change the Solr URL and other things in one spot.
2012-08-18 21:39:17 +10:00
Samuel Lai
46100e7f01 Fixed a small bug where the 'serving media from' message was printed twice. 2012-08-18 21:08:28 +10:00
Samuel Lai
f4940cd1af Fixed a bug in manage.sh where quoted arguments were not passed on with quotes. 2012-08-18 21:01:27 +10:00
Samuel Lai
4430997467 Amended import instructions to account for the command changes in previous commit. 2012-08-18 21:00:58 +10:00
Samuel Lai
e8adaa9b54 Renamed the commands directory and added a script to make them easier to call.
Also deleted the get_sites script as it wasn't very useful, and renamed others
to be more meaningful.
2012-08-18 20:50:13 +10:00
Samuel Lai
e776e95d84 Added an alias for questions for StackExchange style URLs.
This means site_key/questions/question_id is redirected to site_key/question_id.
2012-08-18 20:23:14 +10:00
Samuel Lai
5fc56e4329 Added check to ensure the accepted answer to a question actually exists.
There may be times when it does not exist, e.g. when a question has been merged.
2012-08-18 20:18:26 +10:00
Samuel Lai
9b9b71077c Added informative message when Stackdump is disabled during a site import. 2012-08-18 20:17:15 +10:00
Samuel Lai
2954dd47ba Added a message for questions that have been closed. 2012-08-18 18:40:18 +10:00
Samuel Lai
6181d83cf3 Added a confirmation prompt when importing so the user can confirm site details. 2012-08-18 18:17:00 +10:00
Samuel Lai
9bcac3f92a Fixed some grammar errors in the footer text. 2012-08-18 17:48:55 +10:00
Samuel Lai
ad5f11260a Changed wording from 'posts' to 'questions' on search results pages. 2012-08-18 17:44:57 +10:00
Samuel Lai
827445105b Excess comments (defaults to any over 3) are now hidden by default.
They can be shown by clicking on the 'show comments' link.
2012-08-18 17:44:01 +10:00
Samuel Lai
3d515f51b1 For results with lots of pages, only a limited set of page numbers are rendered. 2012-08-12 16:32:42 +10:00
Samuel Lai
3944261eef Fixed a bug where uncommitted entries from a previously failed import were committed in a later, successful import. 2012-08-12 16:31:30 +10:00
Samuel Lai
1f29fd9113 Modified import.py so it no longer relies on readme.txt.
readme.txt files were dropped as of the August 2012 data dump.
2012-08-12 15:40:48 +10:00
Samuel Lai
dd24d98b39 Upgraded Bottle.py to 0.10.11 and CherryPy to 3.2.2. 2012-08-12 14:57:25 +10:00
Samuel Lai
6156d69af0 Further adjusted start_solr.sh for optimal performance. 2012-08-12 14:17:02 +10:00
Samuel Lai
26b803e119 Improved import speed by ~9-fold by actually committing every 1000 questions.
There was an error made where although questions were only checked for completion every 1000 rows, each completed question was committed separately, resulting in far too many solr calls.

Also modified process to only commit entries in solr at the end, after the database transaction is committed. This means if the process is aborted mid-way through, there won't be orphaned data in solr any more.
2012-08-12 14:13:15 +10:00
Samuel Lai
fdf31a3ef6 Fixed a bug with the previous commit - site object cannot be accessed after transaction has been committed.
Also added an argument so deletes are expunged from the index immediately.
2012-08-12 14:11:16 +10:00
Samuel Lai
6ef90b2ad2 Orphaned entries in solr are also deleted.
Entries are considered orphaned if there is no corresponding database entry, e.g. if an import operation was aborted and hence the database entries were rolled back, but the solr ones were not.
2012-08-07 22:28:21 +10:00
Samuel Lai
9c2e530eff Ordered the list of sites by name. 2012-08-07 22:27:24 +10:00
Samuel Lai
7d6bace28a Stackdump now handles cases where the user has been deleted and does not exist anymore. 2012-08-07 22:07:22 +10:00
Samuel Lai
b1a5977012 Fixed minor styling issue with page title. 2012-08-07 22:06:10 +10:00
Samuel Lai
755a2f2c1f Added some sane default limits for Solr so it doesn't chew up all the RAM.
That tends to happen when importing large datasets.
2012-08-07 19:07:07 +10:00
Samuel Lai
ae208b49ca Fixed a bug where the topbar search box did not have the name attribute.
This caused the search to fail.
2012-08-07 18:52:17 +10:00
Samuel Lai
e38a6e6d0d Fixed a minor bug where if the incorrect version of Python was specified in PYTHON_CMD, no error message was printed and the script just aborted. 2012-08-07 18:51:45 +10:00
Samuel Lai
6013a706d7 Added custom 404 page. 2012-02-12 21:40:35 +11:00
Samuel Lai
9807f0c076 Built licenses pages into the app. 2012-02-12 21:32:48 +11:00
Samuel Lai
1da980424c Links in question view are now parsed and links are re-written where possible to point to the stackdump instance. They are also styled differently to highlight this.
Images are also replaced with a placeholder.
2012-02-12 21:16:07 +11:00
Samuel Lai
5bfcfd2f1a Added an error message when start_python.sh can't find a valid version. 2012-02-12 21:12:33 +11:00
Samuel Lai
105be00e9e Added confirmation when deleting sites. 2012-02-12 21:11:42 +11:00
Samuel Lai
3b1393d9c2 Reduced the number of database calls for the retrieve_sites view helper. 2012-02-12 16:51:14 +11:00
Samuel Lai
f4e2c64de7 Fixed a minor rendering bug where there was no bottom margin for lists. 2012-02-12 16:50:38 +11:00
Samuel Lai
d1ac676db5 Added the site name to the title text for the site logo too. 2012-02-12 15:57:29 +11:00
Samuel Lai
cab37377f7 Added indicator for accepted answer. 2012-02-12 15:54:03 +11:00
Samuel Lai
67f1ac7a3a Added check in the import script to make sure sites aren't inserted with a site key that clashes with Stackdump URLs, e.g. /search or /media. 2012-02-12 14:23:59 +11:00
Samuel Lai
84f5a951ed Re-ordered the Stackdump-wide search method in code so it is processed before the site-specific methods. This means even if a site is imported with the key 'search', it can't replace that page. 2012-02-12 14:19:50 +11:00
Samuel Lai
caebcabd89 Refactored the 'no data' page so the import instructions are accessible even after you've imported your initial dump(s) via a link in the footer. 2012-02-12 14:14:15 +11:00
Samuel Lai
ea0c592fed Very minor adjustment of margins for the site tagline. 2012-02-12 13:58:08 +11:00
Samuel Lai
db3bf11310 Added a 'no data' page with instructions on how to import data into Stackdump. 2012-02-12 13:56:37 +11:00
Samuel Lai
adccd41724 Renamed dataproc management commands to better names. 2012-02-12 13:55:18 +11:00
Samuel Lai
f075580a2e Added indexes to database to speed things up. 2012-02-12 13:54:49 +11:00
Samuel Lai
43f49d1827 Fixed some minor CSS issues, and fixed issues when there is only one site imported in Stackdump. 2012-02-11 22:57:57 +11:00
Samuel Lai
06e210d37a Implemented random questions on home pages. Also consolidated the index pages into a single template. 2012-02-11 22:32:23 +11:00
Samuel Lai
6c58938d44 Changed the default operator to AND properly this time. It has been changed in Solr configuration, instead of hacking the query string. 2012-02-11 21:09:40 +11:00
Samuel Lai
888d7d2e94 Fixed bug where sites without logos were not being correctly served the unknown site logo. 2012-02-11 19:25:21 +11:00
Samuel Lai
b4b2a536e0 Changed from using the route decorator to get because it shows the HTTP methods allowed for that method more clearly. 2012-02-11 19:24:04 +11:00
Samuel Lai
ded9a52d02 Fixed issue where answers were in a difficult-to-read light grey colour. 2012-02-11 19:23:33 +11:00
Samuel Lai
1638617c3e Fixed bug where site_index.html was POSTing the query instead of using GET.
Also fixed issue where errors when searching were not being propagated properly.
2012-02-11 19:20:19 +11:00
Samuel Lai
7363c666d8 Implemented the question view. 2012-02-11 19:08:51 +11:00
Samuel Lai
7dac0dcdea Fixed bug where searching on site search pages would bounce you back to the all sites search page. 2012-02-11 19:08:28 +11:00
Samuel Lai
a4dd607dcc Added missing padding before footer for pages longer than screen height. 2012-02-11 16:44:09 +11:00
Samuel Lai
cda1690be4 Adjusted styling of the 'search all sites' link. 2012-02-11 16:43:34 +11:00
Samuel Lai
68106831ea Fixed the footer at the bottom of the page regardless of the page height. 2012-02-11 16:41:16 +11:00
Samuel Lai
2979714a0c Added a nicer message when no results are returned. 2012-02-11 16:30:36 +11:00
Samuel Lai
101e36d9e0 Query is now automatically modified to default to ANDing keywords together, unless another operator is specified in the query string. 2012-02-11 16:17:40 +11:00
Samuel Lai
7e87726b74 Implemented site-specific search.
Added site logos to searches across all sites for easier identification.
Added hints to make it more obvious which site you are searching.
Minor CSS tweaks.
2012-02-05 17:54:13 +11:00
Samuel Lai
6d32f93452 Changed the indexed field from site name to site key because site key is relatively more unique. 2012-02-05 17:52:44 +11:00
Samuel Lai
3e02dcf151 Fixed bug with paging where the paginator thought the current page was the next page.
Also changed the sort button URLs so when changing sort, you are thrown back to page 1.
2012-02-04 18:38:15 +11:00
Samuel Lai
5078e7369f User details are now retrieved and shown on the results page. 2012-02-04 18:33:24 +11:00
Samuel Lai
68e5e17f74 Minor CSS fixes to make positively voted questions stand out and fixed line height issue with long titles. 2012-02-04 17:10:22 +11:00
Samuel Lai
6511759213 Implemented sorting on the search page. 2012-02-04 17:05:43 +11:00
Samuel Lai
ea2170edb9 Changed name of field from 'score' to 'votes' as score is a keyword in Lucene. 2012-02-04 17:04:54 +11:00
Samuel Lai
4fc9f4e780 Fixed bug where no or negative search result range parameters resulted in HTTP 500. 2012-02-04 15:43:20 +11:00
Samuel Lai
c4c3835841 Implemented paging on results page. 2012-01-29 00:32:15 +11:00
Samuel Lai
96cb924c52 Improved styling of results page with :hover, links and more spacing. 2012-01-28 23:48:00 +11:00
Samuel Lai
90e46dfacf Dates are now formatted using a custom template filter, format_datetime. 2011-11-06 18:14:31 +11:00
Samuel Lai
045b50fe6c Tags are now parsed during import, and inserted into the index as an array field.
Also changed names of multivalued Solr fields so they are plural.
2011-11-06 18:02:06 +11:00
Samuel Lai
098a4f2fa9 Started implementing the search results view. 2011-11-06 17:20:11 +11:00
Samuel Lai
f8a6e7c455 Enabled template autoescaping by default. 2011-11-05 18:52:49 +11:00
Samuel Lai
a2614220ae Created resource decorators that create connections as needed in by the current thread. 2011-11-05 18:43:54 +11:00
Samuel Lai
365acbe80a Added method to make site logos appear if they exist, otherwise show unknown icon. 2011-11-05 12:13:16 +11:00
Samuel Lai
dc7a923fa9 Fleshed initial HTML views and pages. 2011-11-01 22:03:22 +11:00
Samuel Lai
5e930bbc08 Added scripts for deleting a site from the system, and getting site info from the net. 2011-11-01 22:02:25 +11:00
Samuel Lai
489d9aec22 Added some new fields for sites, and ability to look up details from the sites RSS feed. 2011-11-01 17:36:14 +11:00
Samuel Lai
97740a206d Added methods to render templates. 2011-10-30 18:37:33 +11:00
Samuel Lai
a83bed32b5 Extracted models out of dataproc/insert.py so they can be reused elsewhere. 2011-10-30 17:12:01 +11:00
Samuel Lai
18850e5bb5 Fixed media root path due to file having moved directories. 2011-10-30 17:11:41 +11:00
Samuel Lai
d541bdeabc Added missing __init__ file to make stackdump dir a module. 2011-10-30 17:11:00 +11:00
Samuel Lai
65130d5415 Moved code into a single stackdump package so code can be shared more easily. 2011-10-30 16:54:49 +11:00
Samuel Lai
513829e255 Oops, forgot to remove the import statement for the now deprecated servers.py. 2011-10-23 18:20:20 +11:00
Samuel Lai
63da7cd3ca Added Twitter bootstrap and jQuery to static media. 2011-10-23 18:18:48 +11:00
Samuel Lai
bc1572be52 Added static files serving function, and cleaned up more Jython hacks. 2011-10-23 18:17:23 +11:00
Samuel Lai
810e2e5fe3 Removed package modded markers that are no longer needed. 2011-10-23 17:42:23 +11:00
Samuel Lai
61b3ba0c94 Fixed bug where Solr data was being stored one directory too far back. 2011-10-23 17:42:06 +11:00
Samuel Lai
3626ea34ae One last fime to modify for Jython hack removal - the start_python.sh script. 2011-10-23 17:32:44 +11:00
Samuel Lai
61579cb807 Removed the Jython hacks. We're going with CPython only now. 2011-10-23 17:07:59 +11:00
1029 changed files with 101607 additions and 23203 deletions

View File

@@ -17,5 +17,11 @@ testsuite/.*$
tutorial/.*$
# Solr/Jetty
^java/solr/server/work/.*
^java/solr/server/solr/data/.*
^java/solr/server/solr-webapp/.*
^java/solr/server/logs/.*
# ignore the downloaded logos
^python/media/images/logos/.*
# PyCharm project files
^.idea/

BIN
List-StackdumpCommands.ps1 Normal file

Binary file not shown.

179
README.textile Normal file
View File

@@ -0,0 +1,179 @@
h1. Stackdump - an offline browser for StackExchange sites.
Stackdump was conceived for those who work in environments that do not have easy access to the StackExchange family of websites. It allows you to host a read-only instance of the StackExchange sites locally, accessible via a web browser.
Stackdump comprises of two components - the search indexer ("Apache Solr":http://lucene.apache.org/solr/) and the web application. It uses the "StackExchange Data Dumps":http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/, published quarterly by StackExchange, as its source of data.
h2. Screenshots
"Stackdump home":http://edgylogic.com/dynmedia/301/
"Stackdump search results":http://edgylogic.com/dynmedia/303/
"Stackdump question view":http://edgylogic.com/dynmedia/302/
h2. System Requirements
Stackdump was written in Python and requires Python 2.5 or later (but not Python 3). It leverages Apache Solr, which requires the Java runtime (JRE), version 6 or later.
Besides that, there are no OS-dependent dependencies and should work on any platform that Python and Java run on (although it only comes bundled with Linux scripts at the moment). It was, however, developed and tested on CentOS 5 running Python 2.7 and JRE 6 update 27.
You will also need "7-zip":http://www.7-zip.org/ to extract the data dump files, but Stackdump does not use it directly so you can perform the extraction on another machine first.
It is recommended that Stackdump be run on a system with at least 3GB of RAM, particularly if you intend to import StackOverflow into Stackdump. Apache Solr requires a fair bit of memory during the import process. It should also have a fair bit of space available; having at least roughly the space used by the raw, extracted, data dump XML files is a good rule of thumb (note that once imported, the raw data dump XML files are not needed by Stackdump any more).
Finally, Stackdump has been tested and works in the latest browsers (IE9, FF10+, Chrome, Safari). It degrades fairly gracefully in older browsers, although some will have rendering issues, e.g. IE8.
h2. Changes and upgrading to v1.1
Version 1.1 fixes a few bugs, the major one being the inability to import the 2013 data dumps due to changes in the case of the filenames. It also adds a couple of minor features, including support for resolving and rewriting short question and answer permalinks.
Because changes have been made to the search schema and the search indexer has been upgraded (to Solr 4.5), all data will need to be re-indexed. Therefore there is no upgrade path; follow the instructions below to set up Stackdump again. It is recommended to install this new version in a new directory, instead of overwriting the existing one.
h2. Changes and upgrading from v1.1 to v1.2.
The major change in the v1.2 release are improvements to the speed of importing data. There are some other smaller changes, including new PowerShell scripts to start and manage Stackdump on Windows as well as a few bug fixes when running on Windows. The search indexing side of things has not changed, therefore data imported using v1.1 will continue to work in v1.2. _Data from older versions however, needs to be re-indexed. See the above section on upgrading to v1.1 for more details._
h3. Importing the StackOverflow data dump, September 2013
The StackOverflow data dump has grown significantly since I started this project back in 2011. With the improvements in v1.2, on a VM with two cores and 4GB of RAM running CentOS 5.7 on a single, standard hard drive containing spinning pieces of metal,
* it took *84719.565491 seconds* to import it, or *23 hours, 31 minutes and 59.565491 seconds*
* once completed, it used up *20GB* of disk space
* during the import, roughly *30GB* of disk space was needed
* the import process used, at max, around *2GB* of RAM.
In total, the StackOverflow data dump has *15,933,529 posts* (questions and answers), *2,332,403 users* and a very large number of comments.
h2. Setting up
Stackdump was designed for offline environments or environments with poor internet access, therefore it is bundled with all the dependencies it requires (with the exception of Python, Java and 7-zip).
As long as you have:
* "Python":http://python.org/download/,
* "Java":http://java.com/en/download/manual.jsp,
* "Stackdump":https://bitbucket.org/samuel.lai/stackdump/downloads,
* the "StackExchange Data Dump":http://www.clearbits.net/creators/146-stack-exchange-data-dump (Note: this is only available as a torrent), and
* "7-zip":http://www.7-zip.org/ (needed to extract the data dump files)
...you should be able to get an instance up and running.
To provide a better experience, Stackdump can use the RSS feed content to pre-fill some of the required details during the import process, as well as to display the site logos in the app. Stackdump comes bundled with a script that downloads and places these bits in the right places. If you're in a completely offline environment however, it may be worth running this script on a connected box first.
h3. Windows users
If you're using Windows, you will need to substitute the appropriate PowerShell equivalent command for the Stackdump scripts used below. These equivalent PowerShell scripts are in the Stackdump root directory, alongside their Unix counterparts. The names are roughly the same, with the exception of @manage.sh@, which in PowerShell has been broken up into two scripts, @List-StackdumpCommands.ps1@ and @Run-StackdumpCommand.ps1@.
Remember to set your PowerShell execution policy to at least @RemoteSigned@ first as these scripts are not signed. Use the @Get-ExecutionPolicy@ cmdlet to see the current policy, and @Set-ExecutionPolicy@ to set it. You will need to have administrative privileges to set it.
h3. Extract Stackdump
Stackdump was to be self-contained, so to get it up and running, simply extract the Stackdump download to an appropriate location.
h3. Verify dependencies
Next, you should verify that the required Java and Python versions are accessible in the PATH. (If you haven't installed them yet, now is a good time to do so.)
Type @java -version@ and check that it is at least version 1.6.
bq. If you're using Java 7 on Linux and you see an error similar to the following -
@ Error: failed /opt/jre1.7.0_40/lib/i386/server/libjvm.so, because /opt/jre1.7.0_40/lib/i386/server/libjvm.so: cannot restore segment prot after reloc: Permission denied @
this is because you have SELinux enabled. You will need to tell SELinux to allow Java to run by using the following command as root (amending the path as necessary) -
@chcon -t textrel_shlib_t /opt/jre1.7.0_40/lib/i386/server/libjvm.so@
Then type @python -V@ and check that it is version 2.5 or later (and not Python 3).
If you would rather not put these versions in the PATH (e.g. you don't want to override the default version of Python in your Linux distribution), you can tell Stackdump which Java and/or Python to use explicitly by creating a file named @JAVA_CMD@ or @PYTHON_CMD@ respectively in the Stackdump root directory, and placing the path to the executable in there.
h3. Download additional site information
As mentioned earlier, Stackdump can use additional information available in the StackExchange RSS feed to pre-fill required details during the site import process and to show the logos for each site.
To start the download, execute the following command in the Stackdump root directory -
@./manage.sh download_site_info@
If Stackdump will be running in a completely offline environment, it is recommended that you extract and run this command in a connected environment first. If that is not possible, you can manually download the required pieces -
* download the "RSS feed":http://stackexchange.com/feeds/sites to a file
* for each site you will be importing, work out the __site key__ and download the logo by substituting the site key into this URL: @http://sstatic.net/site_key/img/icon-48.png@ where *site_key* is the site key. The site key is generally the bit in the URL before .stackexchange.com, or just the domain without the TLD, e.g. for the Salesforce StackExchange at http://salesforce.stackexchange.com, it is just __salesforce__, while for Server Fault at http://serverfault.com, it is __serverfault__.
The RSS feed file should be copied to the file @stackdump_dir/data/sites@ (create the @data@ directory if it doesn't exist), and the logos should be copied to the @stackdump_dir/python/media/images/logos@ directory and named with the site key and file type extension, e.g. @serverfault.png@.
h3. Import sites
Each data dump for a StackExchange site is a "7-zip":http://www.7-zip.org/ file. Extract the file corresponding to the site you wish to import into a temporary directory. It should have a bunch of XML files in it when complete.
Now make sure you have the search indexer up and running. This can be done by simply executing the @stackdump_dir/start_solr.sh@ command.
To start the import process, execute the following command -
@stackdump_dir/manage.sh import_site --base-url site_url --dump-date dump_date path_to_xml_files@
... where site_url is the URL of the site you're importing, e.g. __android.stackexchange.com__; dump_date is the date of the data dump you're importing, e.g. __August 2012__, and finally path_to_xml_files is the path to the XML files you just extracted. The dump_date is a text string that is shown in the app only, so it can be in any format you want.
For example, to import the August 2012 data dump of the Android StackExchange site, you would execute -
@stackdump_dir/manage.sh import_site --base-url android.stackexchange.com --dump-date "August 2012" /tmp/android@
It is normal to get messages about unknown PostTypeIds and missing comments and answers. These errors are likely due to those posts being hidden via moderation.
This can take anywhere between a minute to 10 hours or more depending on the site you're importing. As a rough guide, __android.stackexchange.com__ took a minute on my VM, while __stackoverflow.com__ took just over 10 hours.
Repeat these steps for each site you wish to import. Do not attempt to import multiple sites at the same time; it will not work and you may end up with half-imported sites.
The import process can be cancelled at any time without any adverse effect, however on the next run it will have to start from scratch again.
h3. Start the app
To start Stackdump, execute the following command -
@stackdump_dir/start_web.sh@
... and visit port 8080 on that machine. That's it - your own offline, read-only instance of StackExchange.
If you need to change the port that it runs on, modify @stackdump_dir/python/src/stackdump/settings.py@ and restart the app.
The aforementioned @settings.py@ file also contains some other settings that control how Stackdump works.
Stackdump comes bundled with some init.d scripts as well which were tested on CentOS 5. These are located in the @init.d@ directory. To use these, you will need to modify them to specify the path to the Stackdump root directory and the user to run under.
Both the search indexer and the app need to be running for Stackdump to work.
h2. Maintenance
Stackdump stores all its data in the @data@ directory under its root directory. If you want to start fresh, just stop the app and the search indexer, delete that directory and restart the app and search indexer.
To delete certain sites from Stackdump, use the manage_sites management command -
@stackdump_dir/manage.sh manage_sites -l@ to list the sites (and their site keys) currently in the system;
@stackdump_dir/manage.sh manage_sites -d site_key@ to delete a particular site.
It is not necessary to delete a site before importing a new data dump of it though; the import process will automatically purge the old copy during the import process.
h2. Credits
Stackdump leverages several open-source projects to do various things, including -
* "twitter-bootstrap":http://github.com/twitter/bootstrap for the UI
* "jQuery":http://jquery.com for the UI
* "bottle.py":http://bottlepy.org for the web framework
* "cherrypy":http://cherrypy.org for the built-in web server
* "pysolr":https://github.com/toastdriven/pysolr/ to connect from Python to the search indexer, Apache Solr
* "html5lib":http://code.google.com/p/html5lib/ for parsing HTML
* "Jinja2":http://jinja.pocoo.org/ for templating
* "SQLObject":http://www.sqlobject.org/ for writing and reading from the database
* "iso8601":http://pypi.python.org/pypi/iso8601/ for date parsing
* "markdown":http://pypi.python.org/pypi/Markdown for rendering comments
* "mathjax":http://www.mathjax.org/ for displaying mathematical expressions properly
* "httplib2":http://code.google.com/p/httplib2/ as a dependency of pysolr
* "Apache Solr":http://lucene.apache.org/solr/ for search functionality
h2. Things not supported... yet
* searching or browsing by tags
* tag wiki pages
* badges
* post history, e.g. reasons why are a post was closed are not listed
h2. License
Stackdump is licensed under the "MIT License":http://en.wikipedia.org/wiki/MIT_License.

BIN
Run-StackdumpCommand.ps1 Normal file

Binary file not shown.

BIN
Start-Python.ps1 Normal file

Binary file not shown.

BIN
Start-Solr.ps1 Normal file

Binary file not shown.

BIN
Start-StackdumpWeb.ps1 Normal file

Binary file not shown.

142
init.d/stackdump_solr Executable file
View File

@@ -0,0 +1,142 @@
#! /bin/bash
#
# stackdump_solr: Starts the Solr instance for Stackdump
#
# chkconfig: 345 99 01
# description: This daemon provides the search engine capability for Stackdump.\
# It is a required part of Stackdump; Stackdump will not work \
# without it.
# Source function library.
. /etc/init.d/functions
# this needs to be the path of the Stackdump root directory.
STACKDUMP_HOME=/opt/stackdump/
# this is the user that Stackdump runs under
STACKDUMP_USER=stackdump
SOLR_PID_FILE=/var/run/stackdump_solr.pid
if [ ! -d "$STACKDUMP_HOME" ]
then
echo "The STACKDUMP_HOME variable does not point to a valid directory."
exit 1
fi
base=${0##*/}
start() {
echo -n $"Starting Stackdump - Solr... "
# create the logs directory if it doesn't already exist
if [ ! -d "$STACKDUMP_HOME/logs" ]
then
runuser -s /bin/bash $STACKDUMP_USER -c "mkdir $STACKDUMP_HOME/logs"
fi
# check if it is already running
SOLR_PID=`cat $SOLR_PID_FILE 2>/dev/null`
if [ ! -z "$SOLR_PID" ]
then
if [ ! -z "$(pgrep -P $SOLR_PID)" ]
then
echo
echo "Stackdump - Solr is already running."
exit 2
else
# the PID is stale.
rm $SOLR_PID_FILE
fi
fi
# run it!
runuser -s /bin/bash $STACKDUMP_USER -c "$STACKDUMP_HOME/start_solr.sh >> $STACKDUMP_HOME/logs/solr.log 2>&1" &
SOLR_PID=$!
RETVAL=$?
if [ $RETVAL = 0 ]
then
echo $SOLR_PID > $SOLR_PID_FILE
success $"$base startup"
else
failure $"$base startup"
fi
echo
return $RETVAL
}
stop() {
# check if it is running
SOLR_PID=`cat $SOLR_PID_FILE 2>/dev/null`
if [ -z "$SOLR_PID" ] || [ -z "$(pgrep -P $SOLR_PID)" ]
then
echo "Stackdump - Solr is not running."
exit 2
fi
echo -n $"Shutting down Stackdump - Solr... "
# it is running, so shut it down.
# there are many levels of processes here and the kill signal needs to
# be sent to the actual Java process for the process to stop, so let's
# just kill the whole process group.
RUNUSER_CMD_PID=`pgrep -P $SOLR_PID`
RUNUSER_CMD_PGRP=`ps -o pgrp --no-headers -p $RUNUSER_CMD_PID`
pkill -g $RUNUSER_CMD_PGRP
RETVAL=$?
[ $RETVAL = 0 ] && success $"$base shutdown" || failure $"$base shutdown"
rm -f $SOLR_PID_FILE
echo
return $RETVAL
}
status() {
# check if it is running
SOLR_PID=`cat $SOLR_PID_FILE 2>/dev/null`
if [ -z "$SOLR_PID" ]
then
echo "Stackdump - Solr is not running."
exit 0
else
if [ -z "$(pgrep -P $SOLR_PID)" ]
then
rm -f $SOLR_PID_FILE
echo "Stackdump - Solr is not running."
exit 0
else
echo "Stackdump - Solr is running."
exit 0
fi
fi
}
restart() {
stop
start
}
RETVAL=0
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 1
esac
exit $RETVAL

141
init.d/stackdump_web Normal file
View File

@@ -0,0 +1,141 @@
#! /bin/bash
#
# stackdump_web: Starts the Stackdump web app
#
# chkconfig: 345 99 01
# description: This daemon is the web server for Stackdump.\
# It requires the Solr instance to be running to function.
# Source function library.
. /etc/init.d/functions
# this needs to be the path of the Stackdump root directory.
STACKDUMP_HOME=/opt/stackdump/
# this is the user that Stackdump runs under
STACKDUMP_USER=stackdump
WEB_PID_FILE=/var/run/stackdump_web.pid
if [ ! -d "$STACKDUMP_HOME" ]
then
echo "The STACKDUMP_HOME variable does not point to a valid directory."
exit 1
fi
base=${0##*/}
start() {
echo -n $"Starting Stackdump - Web... "
# create the logs directory if it doesn't already exist
if [ ! -d "$STACKDUMP_HOME/logs" ]
then
runuser -s /bin/bash $STACKDUMP_USER -c "mkdir $STACKDUMP_HOME/logs"
fi
# check if it is already running
WEB_PID=`cat $WEB_PID_FILE 2>/dev/null`
if [ ! -z "$WEB_PID" ]
then
if [ ! -z "$(pgrep -P $WEB_PID)" ]
then
echo
echo "Stackdump - Web is already running."
exit 2
else
# the PID is stale.
rm $WEB_PID_FILE
fi
fi
# run it!
runuser -s /bin/bash $STACKDUMP_USER -c "$STACKDUMP_HOME/start_web.sh >> $STACKDUMP_HOME/logs/web.log 2>&1" &
WEB_PID=$!
RETVAL=$?
if [ $RETVAL = 0 ]
then
echo $WEB_PID > $WEB_PID_FILE
success $"$base startup"
else
failure $"$base startup"
fi
echo
return $RETVAL
}
stop() {
# check if it is running
WEB_PID=`cat $WEB_PID_FILE 2>/dev/null`
if [ -z "$WEB_PID" ] || [ -z "$(pgrep -P $WEB_PID)" ]
then
echo "Stackdump - Web is not running."
exit 2
fi
echo -n $"Shutting down Stackdump - Web... "
# it is running, so shut it down.
# there are many levels of processes here and the kill signal needs to
# be sent to the actual Python process for the process to stop, so let's
# just kill the whole process group.
RUNUSER_CMD_PID=`pgrep -P $WEB_PID`
RUNUSER_CMD_PGRP=`ps -o pgrp --no-headers -p $RUNUSER_CMD_PID`
pkill -g $RUNUSER_CMD_PGRP
RETVAL=$?
[ $RETVAL = 0 ] && success $"$base shutdown" || failure $"$base shutdown"
rm -f $WEB_PID_FILE
echo
return $RETVAL
}
status() {
# check if it is running
WEB_PID=`cat $WEB_PID_FILE 2>/dev/null`
if [ -z "$WEB_PID" ]
then
echo "Stackdump - Web is not running."
exit 0
else
if [ -z "$(pgrep -P $WEB_PID)" ]
then
rm -f $WEB_PID_FILE
echo "Stackdump - Web is not running."
exit 0
else
echo "Stackdump - Web is running."
exit 0
fi
fi
}
restart() {
stop
start
}
RETVAL=0
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 1
esac
exit $RETVAL

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,117 +1,120 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Welcome to the Apache Solr project!
-----------------------------------
Solr is the popular, blazing fast open source enterprise search platform
from the Apache Lucene project.
For a complete description of the Solr project, team composition, source
code repositories, and other details, please see the Solr web site at
http://lucene.apache.org/solr
Getting Started
---------------
See the "example" directory for an example Solr setup. A tutorial
using the example setup can be found at
http://lucene.apache.org/solr/tutorial.html
or in in "docs/tutorial.html" in a binary distribution.
Files included in an Apache Solr binary distribution
----------------------------------------------------
example/
A self-contained example Solr instance, complete with a sample
configuration, documents to index, and the Jetty Servlet container.
Please see example/README.txt for information about running this
example.
dist/apache-solr-XX.war
The Apache Solr Application. Deploy this WAR file to any servlet
container to run Apache Solr.
dist/apache-solr-XX.jar
The Apache Solr Libraries. This JAR file is needed to compile
Apache Solr Plugins (see http://wiki.apache.org/solr/SolrPlugins for
more information).
docs/index.html
The contents of the Apache Solr website.
docs/api/index.html
The Apache Solr Javadoc API documentation.
Instructions for Building Apache Solr from Source
-------------------------------------------------
1. Download the J2SE 5.0 JDK (Java Development Kit) or later from http://java.sun.com.
You will need the JDK installed, and the %JAVA_HOME%\bin directory included
on your command path. To test this, issue a "java -version" command from your
shell and verify that the Java version is 5.0 or later.
2. Download the Apache Ant binary distribution (1.7.x, not 1.6.x, not 1.8.x) from http://ant.apache.org.
You will need Ant installed and the %ANT_HOME%\bin directory included on your
command path. To test this, issue a "ant -version" command from your
shell and verify that Ant is available.
3. Download the Apache Solr distribution, linked from the above
web site. Expand the distribution to a folder of your choice, e.g. c:\solr.
Alternately, you can obtain a copy of the latest Apache Solr source code
directly from the Subversion repository:
http://lucene.apache.org/solr/version_control.html
4. Navigate to the "solr" folder and issue an "ant" command to see the available options
for building, testing, and packaging Solr.
NOTE:
To see Solr in action, you may want to use the "ant example" command to build
and package Solr into the example/webapps directory. See also example/README.txt.
Export control
-------------------------------------------------
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
extracting text content and metadata from encrypted PDF files.
See http://www.bouncycastle.org/ for more details on Bouncy Castle.
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Welcome to the Apache Solr project!
-----------------------------------
Solr is the popular, blazing fast open source enterprise search platform
from the Apache Lucene project.
For a complete description of the Solr project, team composition, source
code repositories, and other details, please see the Solr web site at
http://lucene.apache.org/solr
Getting Started
---------------
See the "example" directory for an example Solr setup. A tutorial
using the example setup can be found at
http://lucene.apache.org/solr/tutorial.html
or linked from "docs/index.html" in a binary distribution.
Also, there are Solr clients for many programming languages, see
http://wiki.apache.org/solr/IntegratingSolr
Files included in an Apache Solr binary distribution
----------------------------------------------------
example/
A self-contained example Solr instance, complete with a sample
configuration, documents to index, and the Jetty Servlet container.
Please see example/README.txt for information about running this
example.
dist/solr-XX.war
The Apache Solr Application. Deploy this WAR file to any servlet
container to run Apache Solr.
dist/solr-<component>-XX.jar
The Apache Solr libraries. To compile Apache Solr Plugins,
one or more of these will be required. The core library is
required at a minimum. (see http://wiki.apache.org/solr/SolrPlugins
for more information).
docs/index.html
The Apache Solr Javadoc API documentation and Tutorial
Instructions for Building Apache Solr from Source
-------------------------------------------------
1. Download the Java SE 6 JDK (Java Development Kit) or later from http://java.sun.com/
You will need the JDK installed, and the $JAVA_HOME/bin (Windows: %JAVA_HOME%\bin)
folder included on your command path. To test this, issue a "java -version" command
from your shell (command prompt) and verify that the Java version is 1.6 or later.
2. Download the Apache Ant binary distribution (1.8.2+) from
http://ant.apache.org/ You will need Ant installed and the $ANT_HOME/bin (Windows:
%ANT_HOME%\bin) folder included on your command path. To test this, issue a
"ant -version" command from your shell (command prompt) and verify that Ant is
available.
You will also need to install Apache Ivy binary distribution (2.2.0) from
http://ant.apache.org/ivy/ and place ivy-2.2.0.jar file in ~/.ant/lib -- if you skip
this step, the Solr build system will offer to do it for you.
3. Download the Apache Solr distribution, linked from the above web site.
Unzip the distribution to a folder of your choice, e.g. C:\solr or ~/solr
Alternately, you can obtain a copy of the latest Apache Solr source code
directly from the Subversion repository:
http://lucene.apache.org/solr/versioncontrol.html
4. Navigate to the "solr" folder and issue an "ant" command to see the available options
for building, testing, and packaging Solr.
NOTE:
To see Solr in action, you may want to use the "ant example" command to build
and package Solr into the example/webapps directory. See also example/README.txt.
Export control
-------------------------------------------------
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
extracting text content and metadata from encrypted PDF files.
See http://www.bouncycastle.org/ for more details on Bouncy Castle.

View File

@@ -0,0 +1,13 @@
# System Requirements
Apache Solr runs of Java 6 or greater. When using Java 7, be sure to
install at least Update 1! With all Java versions it is strongly
recommended to not use experimental `-XX` JVM options. It is also
recommended to always use the latest update version of your Java VM,
because bugs may affect Solr. An overview of known JVM bugs can be
found on http://wiki.apache.org/lucene-java/JavaBugs.
CPU, disk and memory requirements are based on the many choices made in
implementing Solr (document size, number of documents, and number of
hits retrieved to name a few). The benchmarks page has some information
related to performance on particular platforms.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
java/solr/dist/solr-cell-4.5.0.jar vendored Normal file

Binary file not shown.

BIN
java/solr/dist/solr-clustering-4.5.0.jar vendored Normal file

Binary file not shown.

BIN
java/solr/dist/solr-core-4.5.0.jar vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
java/solr/dist/solr-langid-4.5.0.jar vendored Normal file

Binary file not shown.

BIN
java/solr/dist/solr-solrj-4.5.0.jar vendored Normal file

Binary file not shown.

Binary file not shown.

BIN
java/solr/dist/solr-uima-4.5.0.jar vendored Normal file

Binary file not shown.

BIN
java/solr/dist/solr-velocity-4.5.0.jar vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
java/solr/dist/solrj-lib/noggit-0.5.jar vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,6 @@
The Solr test-framework products base classes and utility classes for
writting JUnit tests excercising Solr functionality.
This test framework relies on the lucene components found in in the
./lucene-libs/ directory, as well as the third-party libraries found
in the ./lib directory.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -1,51 +0,0 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Solr example configuration
--------------------------
To run this example configuration, use
java -jar start.jar
in this directory, and when Solr is started connect to
http://localhost:8983/solr/admin/
To add documents to the index, use the post.sh script in the exampledocs
subdirectory (while Solr is running), for example:
cd exampledocs
sh post.sh *.xml
See also README.txt in the solr subdirectory, and check
http://wiki.apache.org/solr/SolrResources for a list of tutorials and
introductory articles.
NOTE: This Solr example server references certain Solr jars outside of
this server directory for non-core modules with <lib> statements in
solrconfig.xml. If you make a copy of this example server and wish
to use the ExtractingRequestHandler (SolrCell), DataImportHandler (DIH),
UIMA, the clustering component, or other modules in "contrib",
you will need to copy the required jars into solr/lib or update the paths to
the jars in your solrconfig.xml.
By default, start.jar starts Solr in Jetty using the default solr home
directory of "./solr/" -- To run other example configurations, you can
speciy the solr.solr.home system property when starting jetty...
java -Dsolr.solr.home=multicore -jar start.jar
java -Dsolr.solr.home=example-DIH -jar start.jar

View File

@@ -0,0 +1,8 @@
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<Set name="contextPath"><SystemProperty name="hostContext" default="/solr"/></Set>
<Set name="war"><SystemProperty name="jetty.home"/>/webapps/solr.war</Set>
<Set name="defaultsDescriptor"><SystemProperty name="jetty.home"/>/etc/webdefault.xml</Set>
<Set name="tempDirectory"><Property name="jetty.home" default="."/>/solr-webapp</Set>
</Configure>

View File

@@ -0,0 +1,37 @@
#!/bin/bash -ex
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
############
# This script shows how the solrtest.keystore file used for solr tests
# and these example configs was generated.
#
# Running this script should only be necessary if the keystore file
# needs to be replaced, which shouldn't be required until sometime around
# the year 4751.
#
# NOTE: the "-ext" option used in the "keytool" command requires that you have
# the java7 version of keytool, but the generated key will work with any
# version of java
echo "### remove old keystore"
rm -f solrtest.keystore
echo "### create keystore and keys"
keytool -keystore solrtest.keystore -storepass "secret" -alias solrtest -keypass "secret" -genkey -keyalg RSA -dname "cn=localhost, ou=SolrTest, o=lucene.apache.org, c=US" -ext "san=ip:127.0.0.1" -validity 999999

View File

@@ -1,227 +1,205 @@
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "http://jetty.mortbay.org/configure.dtd">
<!-- =============================================================== -->
<!-- Configure the Jetty Server -->
<!-- -->
<!-- Documentation of this file format can be found at: -->
<!-- http://docs.codehaus.org/display/JETTY/jetty.xml -->
<!-- -->
<!-- =============================================================== -->
<Configure id="Server" class="org.mortbay.jetty.Server">
<!-- Increase the maximum POST size to 1 MB to be able to handle large shard requests -->
<Call class="java.lang.System" name="setProperty">
<Arg>org.mortbay.jetty.Request.maxFormContentSize</Arg>
<Arg>1000000</Arg>
</Call>
<!-- =========================================================== -->
<!-- Server Thread Pool -->
<!-- =========================================================== -->
<Set name="ThreadPool">
<New class="org.mortbay.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="lowThreads">20</Set>
</New>
<!-- Optional Java 5 bounded threadpool with job queue
<New class="org.mortbay.thread.concurrent.ThreadPool">
<Set name="corePoolSize">50</Set>
<Set name="maximumPoolSize">50</Set>
</New>
-->
</Set>
<!-- =========================================================== -->
<!-- Set connectors -->
<!-- =========================================================== -->
<!-- One of each type! -->
<!-- =========================================================== -->
<!-- Use this connector for many frequently idle connections
and for threadless continuations.
-->
<!--
<Call name="addConnector">
<Arg>
<New class="org.mortbay.jetty.nio.SelectChannelConnector">
<Set name="host"><SystemProperty name="jetty.host" /></Set>
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
<Set name="maxIdleTime">30000</Set>
<Set name="Acceptors">2</Set>
<Set name="statsOn">false</Set>
<Set name="confidentialPort">8443</Set>
<Set name="lowResourcesConnections">5000</Set>
<Set name="lowResourcesMaxIdleTime">5000</Set>
</New>
</Arg>
</Call>
-->
<!-- This connector is currently being used for Solr because it
showed better performance than nio.SelectChannelConnector
for typical Solr requests. -->
<Call name="addConnector">
<Arg>
<New class="org.mortbay.jetty.bio.SocketConnector">
<Set name="host"><SystemProperty name="jetty.host" default="localhost" /></Set>
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
<Set name="maxIdleTime">50000</Set>
<Set name="lowResourceMaxIdleTime">1500</Set>
<Set name="statsOn">false</Set>
</New>
</Arg>
</Call>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- To add a HTTPS SSL listener -->
<!-- see jetty-ssl.xml to add an ssl connector. use -->
<!-- java -jar start.jar etc/jetty.xml etc/jetty-ssl.xml -->
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- To allow Jetty to be started from xinetd -->
<!-- mixin jetty-xinetd.xml: -->
<!-- java -jar start.jar etc/jetty.xml etc/jetty-xinetd.xml -->
<!-- -->
<!-- See jetty-xinetd.xml for further instructions. -->
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- =========================================================== -->
<!-- Set up global session ID manager -->
<!-- =========================================================== -->
<!--
<Set name="sessionIdManager">
<New class="org.mortbay.jetty.servlet.HashSessionIdManager">
<Set name="workerName">node1</Set>
</New>
</Set>
-->
<!-- =========================================================== -->
<!-- Set handler Collection Structure -->
<!-- =========================================================== -->
<Set name="handler">
<New id="Handlers" class="org.mortbay.jetty.handler.HandlerCollection">
<Set name="handlers">
<Array type="org.mortbay.jetty.Handler">
<Item>
<New id="Contexts" class="org.mortbay.jetty.handler.ContextHandlerCollection"/>
</Item>
<Item>
<New id="DefaultHandler" class="org.mortbay.jetty.handler.DefaultHandler"/>
</Item>
<Item>
<New id="RequestLog" class="org.mortbay.jetty.handler.RequestLogHandler"/>
</Item>
</Array>
</Set>
</New>
</Set>
<!-- =========================================================== -->
<!-- Configure the context deployer -->
<!-- A context deployer will deploy contexts described in -->
<!-- configuration files discovered in a directory. -->
<!-- The configuration directory can be scanned for hot -->
<!-- deployments at the configured scanInterval. -->
<!-- -->
<!-- This deployer is configured to deploy contexts configured -->
<!-- in the $JETTY_HOME/contexts directory -->
<!-- -->
<!-- =========================================================== -->
<Call name="addLifeCycle">
<Arg>
<New class="org.mortbay.jetty.deployer.ContextDeployer">
<Set name="contexts"><Ref id="Contexts"/></Set>
<Set name="configurationDir"><SystemProperty name="jetty.home" default="."/>/contexts</Set>
<Set name="scanInterval">5</Set>
</New>
</Arg>
</Call>
<!-- =========================================================== -->
<!-- Configure the webapp deployer. -->
<!-- A webapp deployer will deploy standard webapps discovered -->
<!-- in a directory at startup, without the need for additional -->
<!-- configuration files. It does not support hot deploy or -->
<!-- non standard contexts (see ContextDeployer above). -->
<!-- -->
<!-- This deployer is configured to deploy webapps from the -->
<!-- $JETTY_HOME/webapps directory -->
<!-- -->
<!-- Normally only one type of deployer need be used. -->
<!-- -->
<!-- =========================================================== -->
<Call name="addLifeCycle">
<Arg>
<New class="org.mortbay.jetty.deployer.WebAppDeployer">
<Set name="contexts"><Ref id="Contexts"/></Set>
<Set name="webAppDir"><SystemProperty name="jetty.home" default="."/>/webapps</Set>
<Set name="parentLoaderPriority">false</Set>
<Set name="extract">true</Set>
<Set name="allowDuplicates">false</Set>
<Set name="defaultsDescriptor"><SystemProperty name="jetty.home" default="."/>/etc/webdefault.xml</Set>
</New>
</Arg>
</Call>
<!-- =========================================================== -->
<!-- Configure Authentication Realms -->
<!-- Realms may be configured for the entire server here, or -->
<!-- they can be configured for a specific web app in a context -->
<!-- configuration (see $(jetty.home)/contexts/test.xml for an -->
<!-- example). -->
<!-- =========================================================== -->
<!--
<Set name="UserRealms">
<Array type="org.mortbay.jetty.security.UserRealm">
<Item>
<New class="org.mortbay.jetty.security.HashUserRealm">
<Set name="name">Test Realm</Set>
<Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
<Set name="refreshInterval">0</Set>
</New>
</Item>
</Array>
</Set>
-->
<!-- =========================================================== -->
<!-- Configure Request Log -->
<!-- Request logs may be configured for the entire server here, -->
<!-- or they can be configured for a specific web app in a -->
<!-- contexts configuration (see $(jetty.home)/contexts/test.xml -->
<!-- for an example). -->
<!-- =========================================================== -->
<!--
<Ref id="RequestLog">
<Set name="requestLog">
<New id="RequestLogImpl" class="org.mortbay.jetty.NCSARequestLog">
<Set name="filename"><SystemProperty name="jetty.logs" default="./logs"/>/yyyy_mm_dd.request.log</Set>
<Set name="filenameDateFormat">yyyy_MM_dd</Set>
<Set name="retainDays">90</Set>
<Set name="append">true</Set>
<Set name="extended">false</Set>
<Set name="logCookies">false</Set>
<Set name="LogTimeZone">GMT</Set>
</New>
</Set>
</Ref>
-->
<!-- =========================================================== -->
<!-- extra options -->
<!-- =========================================================== -->
<Set name="stopAtShutdown">true</Set>
<Set name="sendServerVersion">false</Set>
<Set name="sendDateHeader">false</Set>
<Set name="gracefulShutdown">1000</Set>
</Configure>
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<!-- =============================================================== -->
<!-- Configure the Jetty Server -->
<!-- -->
<!-- Documentation of this file format can be found at: -->
<!-- http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax -->
<!-- -->
<!-- =============================================================== -->
<Configure id="Server" class="org.eclipse.jetty.server.Server">
<!-- =========================================================== -->
<!-- Server Thread Pool -->
<!-- =========================================================== -->
<Set name="ThreadPool">
<!-- Default queued blocking threadpool -->
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
<!-- =========================================================== -->
<!-- Set connectors -->
<!-- =========================================================== -->
<!--
<Call name="addConnector">
<Arg>
<New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
<Set name="host"><SystemProperty name="jetty.host" /></Set>
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
<Set name="maxIdleTime">50000</Set>
<Set name="Acceptors">2</Set>
<Set name="statsOn">false</Set>
<Set name="confidentialPort">8443</Set>
<Set name="lowResourcesConnections">5000</Set>
<Set name="lowResourcesMaxIdleTime">5000</Set>
</New>
</Arg>
</Call>
-->
<!-- This connector is currently being used for Solr because it
showed better performance than nio.SelectChannelConnector
for typical Solr requests. -->
<Call name="addConnector">
<Arg>
<New class="org.eclipse.jetty.server.bio.SocketConnector">
<Call class="java.lang.System" name="setProperty"> <Arg>log4j.configuration</Arg> <Arg>etc/log4j.properties</Arg> </Call>
<Set name="host"><SystemProperty name="jetty.host" /></Set>
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
<Set name="maxIdleTime">50000</Set>
<Set name="lowResourceMaxIdleTime">1500</Set>
<Set name="statsOn">false</Set>
</New>
</Arg>
</Call>
<!-- if the connector below is uncommented, then jetty will also accept SSL
connections on port 8984, using a self signed certificate and can
optionally require the client to authenticate with a certificate.
(which can be the same as the server certificate_
# Run solr example with SSL on port 8984
java -jar start.jar
#
# Run post.jar so that it trusts the server cert...
java -Djavax.net.ssl.trustStore=../etc/solrtest.keystore -Durl=https://localhost:8984/solr/update -jar post.jar *.xml
# Run solr example with SSL requiring client certs on port 8984
java -Djetty.ssl.clientAuth=true -jar start.jar
#
# Run post.jar so that it trusts the server cert,
# and authenticates with a client cert
java -Djavax.net.ssl.keyStorePassword=secret -Djavax.net.ssl.keyStore=../etc/solrtest.keystore -Djavax.net.ssl.trustStore=../etc/solrtest.keystore -Durl=https://localhost:8984/solr/update -jar post.jar *.xml
-->
<!--
<Call name="addConnector">
<Arg>
<New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector">
<Arg>
<New class="org.eclipse.jetty.http.ssl.SslContextFactory">
<Set name="keyStore"><SystemProperty name="jetty.home" default="."/>/etc/solrtest.keystore</Set>
<Set name="keyStorePassword">secret</Set>
<Set name="needClientAuth"><SystemProperty name="jetty.ssl.clientAuth" default="false"/></Set>
</New>
</Arg>
<Set name="port"><SystemProperty name="jetty.ssl.port" default="8984"/></Set>
<Set name="maxIdleTime">30000</Set>
</New>
</Arg>
</Call>
-->
<!-- =========================================================== -->
<!-- Set handler Collection Structure -->
<!-- =========================================================== -->
<Set name="handler">
<New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
<Set name="handlers">
<Array type="org.eclipse.jetty.server.Handler">
<Item>
<New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
</Item>
<Item>
<New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>
</Item>
<Item>
<New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler"/>
</Item>
</Array>
</Set>
</New>
</Set>
<!-- =========================================================== -->
<!-- Configure Request Log -->
<!-- =========================================================== -->
<!--
<Ref id="Handlers">
<Call name="addHandler">
<Arg>
<New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler">
<Set name="requestLog">
<New id="RequestLogImpl" class="org.eclipse.jetty.server.NCSARequestLog">
<Set name="filename">
logs/request.yyyy_mm_dd.log
</Set>
<Set name="filenameDateFormat">yyyy_MM_dd</Set>
<Set name="retainDays">90</Set>
<Set name="append">true</Set>
<Set name="extended">false</Set>
<Set name="logCookies">false</Set>
<Set name="LogTimeZone">UTC</Set>
</New>
</Set>
</New>
</Arg>
</Call>
</Ref>
-->
<!-- =========================================================== -->
<!-- extra options -->
<!-- =========================================================== -->
<Set name="stopAtShutdown">true</Set>
<Set name="sendServerVersion">false</Set>
<Set name="sendDateHeader">false</Set>
<Set name="gracefulShutdown">1000</Set>
<Set name="dumpAfterStart">false</Set>
<Set name="dumpBeforeStop">false</Set>
<Call name="addBean">
<Arg>
<New id="DeploymentManager" class="org.eclipse.jetty.deploy.DeploymentManager">
<Set name="contexts">
<Ref id="Contexts" />
</Set>
<Call name="setContextAttribute">
<Arg>org.eclipse.jetty.server.webapp.ContainerIncludeJarPattern</Arg>
<Arg>.*/servlet-api-[^/]*\.jar$</Arg>
</Call>
<!-- Add a customize step to the deployment lifecycle -->
<!-- uncomment and replace DebugBinding with your extended AppLifeCycle.Binding class
<Call name="insertLifeCycleNode">
<Arg>deployed</Arg>
<Arg>starting</Arg>
<Arg>customise</Arg>
</Call>
<Call name="addLifeCycleBinding">
<Arg>
<New class="org.eclipse.jetty.deploy.bindings.DebugBinding">
<Arg>customise</Arg>
</New>
</Arg>
</Call>
-->
</New>
</Arg>
</Call>
<Ref id="DeploymentManager">
<Call name="addAppProvider">
<Arg>
<New class="org.eclipse.jetty.deploy.providers.ContextProvider">
<Set name="monitoredDirName"><SystemProperty name="jetty.home" default="."/>/contexts</Set>
<Set name="scanInterval">0</Set>
</New>
</Arg>
</Call>
</Ref>
</Configure>

View File

@@ -0,0 +1,38 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# To use this log config, start solr with the following system property:
# -Djava.util.logging.config.file=etc/logging.properties
## Default global logging level:
.level = INFO
## Log every update command (add, delete, commit, ...)
#org.apache.solr.update.processor.LogUpdateProcessor.level = FINE
## Where to log (space separated list).
handlers = java.util.logging.FileHandler
java.util.logging.FileHandler.level = FINE
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
# 1 GB limit per file
java.util.logging.FileHandler.limit = 1073741824
# Log to the logs directory, with log files named solrxxx.log
java.util.logging.FileHandler.pattern = ./logs/solr%u.log

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -1,36 +0,0 @@
==============================================================
Jetty Web Container
Copyright 1995-2009 Mort Bay Consulting Pty Ltd
==============================================================
The Jetty Web Container is Copyright Mort Bay Consulting Pty Ltd
unless otherwise noted. It is licensed under the apache 2.0
license.
The javax.servlet package used by Jetty is copyright
Sun Microsystems, Inc and Apache Software Foundation. It is
distributed under the Common Development and Distribution License.
You can obtain a copy of the license at
https://glassfish.dev.java.net/public/CDDLv1.0.html.
The UnixCrypt.java code ~Implements the one way cryptography used by
Unix systems for simple password protection. Copyright 1996 Aki Yoshida,
modified April 2001 by Iris Van den Broeke, Daniel Deville.
Permission to use, copy, modify and distribute UnixCrypt
for non-commercial or commercial purposes and without fee is
granted provided that the copyright notice appears in all copies.
The default JSP implementation is provided by the Glassfish JSP engine
from project Glassfish http://glassfish.dev.java.net. Copyright 2005
Sun Microsystems, Inc. and portions Copyright Apache Software Foundation.
Some portions of the code are Copyright:
2006 Tim Vernum
1999 Jason Gilbert.
The jboss integration module contains some LGPL code.
The win32 Java Service Wrapper (v3.2.3) is Copyright (c) 1999, 2006
Tanuki Software, Inc. and 2001 Silver Egg Technology. It is
covered by an open license which is viewable at
http://svn.codehaus.org/jetty/jetty/branches/jetty-6.1/extras/win32service/LICENSE.txt

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,24 @@
# Logging level
solr.log=logs/
log4j.rootLogger=INFO, file, CONSOLE
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x \u2013 %m%n
#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=4MB
log4j.appender.file.MaxBackupIndex=9
#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%-5p - %d{yyyy-MM-dd HH:mm:ss.SSS}; %C; %m\n
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN
# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF

View File

@@ -1,54 +1,63 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Example "Solr Home" Directory
=============================
This directory is provided as an example of what a "Solr Home" directory
should look like.
It's not strictly necessary that you copy all of the files in this
directory when setting up a new instance of Solr, but it is recommended.
Basic Directory Structure
-------------------------
The Solr Home directory typically contains the following subdirectories...
conf/
This directory is mandatory and must contain your solrconfig.xml
and schema.xml. Any other optional configuration files would also
be kept here.
data/
This directory is the default location where Solr will keep your
index, and is used by the replication scripts for dealing with
snapshots. You can override this location in the solrconfig.xml
and scripts.conf files. Solr will create this directory if it
does not already exist.
lib/
This directory is optional. If it exists, Solr will load any Jars
found in this directory and use them to resolve any "plugins"
specified in your solrconfig.xml or schema.xml (ie: Analyzers,
Request Handlers, etc...). Alternatively you can use the <lib>
syntax in solrconfig.xml to direct Solr to your plugins. See the
example solrconfig.xml file for details.
bin/
This directory is optional. It is the default location used for
keeping the replication scripts.
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Example Solr Home Directory
=============================
This directory is provided as an example of what a "Solr Home" directory
should look like.
It's not strictly necessary that you copy all of the files in this
directory when setting up a new instance of Solr, but it is recommended.
Basic Directory Structure
-------------------------
The Solr Home directory typically contains the following...
* solr.xml *
This is the primary configuration file Solr looks for when starting.
This file specifies the list of "SolrCores" it should load, and high
level configuration options that should be used for all SolrCores.
Please see the comments in ./solr.xml for more details.
If no solr.xml file is found, then Solr assumes that there should be
a single SolrCore named "collection1" and that the "Instance Directory"
for collection1 should be the same as the Solr Home Directory.
* Individual SolrCore Instance Directories *
Although solr.xml can be configured to look for SolrCore Instance Directories
in any path, simple sub-directories of the Solr Home Dir using relative paths
are common for many installations. In this directory you can see the
"./collection1" Instance Directory.
* A Shared 'lib' Directory *
Although solr.xml can be configured with an optional "sharedLib" attribute
that can point to any path, it is common to use a "./lib" sub-directory of the
Solr Home Directory.
* ZooKeeper Files *
When using SolrCloud using the embedded ZooKeeper option for Solr, it is
common to have a "zoo.cfg" file and "zoo_data" directories in the Solr Home
Directory. Please see the SolrCloud wiki page for more details...
https://wiki.apache.org/solr/SolrCloud

File diff suppressed because it is too large Load Diff

View File

@@ -1,2 +0,0 @@
pizza
history

View File

@@ -1,184 +0,0 @@
#macro(param $key)$request.params.get($key)#end
#macro(url_for_solr)/solr#if($request.core.name != "")/$request.core.name#end#end
#macro(url_for_home)#url_for_solr/browse#end
#macro(q)&q=$!{esc.url($params.get('q'))}#end
#macro(fqs $p)#foreach($fq in $p)#if($velocityCount>1)&#{end}fq=$esc.url($fq)#end#end
#macro(debug)#if($request.params.get('debugQuery'))&debugQuery=true#end#end
#macro(boostPrice)#if($request.params.get('bf') == 'price')&bf=price#end#end
#macro(annotate)#if($request.params.get('annotateBrowse'))&annotateBrowse=true#end#end
#macro(annTitle $msg)#if($annotate == true)title="$msg"#end#end
#macro(spatial)#if($request.params.get('sfield'))&sfield=store#end#if($request.params.get('pt'))&pt=$request.params.get('pt')#end#if($request.params.get('d'))&d=$request.params.get('d')#end#end
#macro(qOpts)#set($queryOpts = $request.params.get("queryOpts"))#if($queryOpts && $queryOpts != "")&queryOpts=$queryOpts#end#end
#macro(lensNoQ)?#if($request.params.getParams('fq') and $list.size($request.params.getParams('fq')) > 0)&#fqs($request.params.getParams('fq'))#end#debug#boostPrice#annotate#spatial#qOpts#end
#macro(lens)#lensNoQ#q#end
#macro(url_for_lens)#{url_for_home}#lens#end
#macro(url_for_start $start)#url_for_home#lens&start=$start#end
#macro(url_for_filters $p)#url_for_home?#q#boostPrice#spatial#qOpts#if($list.size($p) > 0)&#fqs($p)#end#debug#end
#macro(url_for_nested_facet_query $field)#url_for_home#lens&fq=$esc.url($field)#end
## TODO: convert to use {!raw f=$field}$value (with escaping of course)
#macro(url_for_facet_filter $field $value)#url_for_home#lens&fq=$esc.url($field):%22$esc.url($value)%22#end
#macro(url_for_facet_date_filter $field $value)#url_for_home#lens&fq=$esc.url($field):$esc.url($value)#end
#macro(url_for_facet_range_filter $field $value)#url_for_home#lens&fq=$esc.url($field):$esc.url($value)#end
#macro(link_to_previous_page $text)
#if($page.current_page_number > 1)
#set($prev_start = $page.start - $page.results_per_page)
<a class="prev-page" href="#url_for_start($prev_start)">$text</a>
#end
#end
#macro(link_to_next_page $text)
#if($page.current_page_number < $page.page_count)
#set($next_start = $page.start + $page.results_per_page)
<a class="next-page" href="#url_for_start($next_start)">$text</a>
#end
#end
#macro(link_to_page $page_number $text)
#if($page_number == $page.current_page_number)
$text
#else
#if($page_number <= $page.page_count)
#set($page_start = $page_number * $page.results_per_page - $page.results_per_page)
<a class="page" href="#url_for_start($page_start)">$text</a>
#end
#end
#end
#macro(display_facet_query $field, $display, $fieldName)
#if($field.size() > 0)
<span class="facet-field">$display</span>
<ul>
#foreach ($facet in $field)
#if ($facet.value > 0)
#set($facetURL = "#url_for_nested_facet_query($facet.key)")
#if ($facetURL != '')
<li><a href="$facetURL">$facet.key</a> ($facet.value)</li>
#end
#end
#end
</ul>
#end
#end
#macro(display_facet_range_date $field, $display, $fieldName)
<span class="facet-field">$display</span>
##Note: even if mincount is 1, you can still get a '0' before & after
##Note: We assume facet.range.include='lower'
<ul>
#if ($field.before && $field.before > 0)
#set($value = "[* TO " + $date.format("yyyy-MM-dd'T'HH:mm:ss'Z'", $field.start) + "-1MILLIS]")
#set($facetURL = "#url_for_facet_date_filter($fieldName, $value)")
<li><a href="$facetURL">Before</a> ($field.before)</li>
#end
#foreach ($facet in $field.counts)
#set($theDate = $date.toDate("yyyy-MM-dd'T'HH:mm:ss'Z'", $facet.key))
#set($value = '["' + $facet.key + '" TO "' + $facet.key + $field.gap + '-1MILLIS"]')
#set($facetURL = "#url_for_facet_date_filter($fieldName, $value)")
#if ($facetURL != '')
<li><a href="$facetURL">$date.format('MMM yyyy', $theDate)</a> ($facet.value)</li>
#end
#end
#if ($field.after && $field.after > 0)
#set($value = "[" + $date.format("yyyy-MM-dd'T'HH:mm:ss'Z'", $field.after) + " TO *]")
#set($facetURL = "#url_for_facet_date_filter($fieldName, $value)")
<li><a href="$facetURL">After</a> ($field.after)</li>
#end
</ul>
#end
#macro(display_facet_range $field, $display, $fieldName, $start, $end, $gap, $before, $after)
<span class="facet-field">$display</span>
<ul>
#if($before && $before != "")
#set($value = "[* TO " + $start + "]")
#set($facetURL = "#url_for_facet_range_filter($fieldName, $value)")
<li><a href="$facetURL">Less than $start</a> ($before)</li>
#end
#foreach ($facet in $field)
#set($rangeEnd = $math.add($facet.key, $gap))
#set($value = "[" + $facet.key + " TO " + $rangeEnd + "]")
#set($facetURL = "#url_for_facet_range_filter($fieldName, $value)")
#if ($facetURL != '')
<li><a href="$facetURL">$facet.key</a> ($facet.value)</li>
#end
#end
#if($end && $end != "")
#set($value = "[" + $end + " TO *]")
#set($facetURL = "#url_for_facet_range_filter($fieldName, $value)")
<li><a href="$facetURL">More than $math.toNumber($end)</a> ($after)</li>
#end
</ul>
#end
## <lst name="facet_pivot">
## <arr name="cat,inStock">
## <lst>
## <str name="field">cat</str>
## <str name="value">electronics</str>
## <int name="count">17</int>
## <arr name="pivot">
## <lst>
## <str name="field">inStock</str>
## <str name="value">true</str>
## <int name="count">13</int>
## </lst>
## <lst>
## <str name="field">inStock</str>
## <str name="value">false</str>
## <int name="count">4</int>
## </lst>
## </arr>
## </lst>
## $pivots is a list of facet_pivot
#macro(display_facet_pivot $pivots, $display)
#if($pivots.size() > 0)
<span class="facet-field">$display</span>
<ul>
#foreach ($pivot in $pivots)
#foreach ($entry in $pivot.value)
<a href="#url_for_facet_filter($entry.field, $entry.value)">$entry.field::$entry.value</a> ($entry.count)
<ul>
#foreach($nest in $entry.pivot)
<a href="#url_for_facet_filter($entry.field, $entry.value)&fq=$esc.url($nest.field):%22$esc.url($nest.value)%22">$nest.field::$nest.value</a> ($nest.count)
#end
</ul>
#end
#end
</ul>
#end
#end
#macro(field $f)
#if($response.response.highlighting.get($docId).get($f).get(0))
$!response.response.highlighting.get($docId).get($f).get(0)
#else
#foreach($v in $doc.getFieldValues($f))
$v
#end
#end
#end

View File

@@ -1,45 +0,0 @@
#set($searcher=$request.searcher)
#set($params=$request.params)
#set($clusters = $response.response.clusters)
#set($mltResults = $response.response.get("moreLikeThis"))
#set($annotate = $params.get("annotateBrowse"))
#parse('query.vm')
#if($response.response.spellcheck.suggestions and $response.response.spellcheck.suggestions.size() > 0)
Did you mean <a href="#url_for_home?q=$esc.url($response.response.spellcheck.suggestions.collation)#if($list.size($request.params.getParams('fq')) > 0)&#fqs($request.params.getParams('fq'))#end#debug">$response.response.spellcheck.suggestions.collation</a>?
#end
<div class="navigators">
#parse("facets.vm")
</div>
<div class="pagination">
#if($response.response.get('grouped'))
<span><span class="results-found">$response.response.get('grouped').size() group(s)</span> found in ${response.responseHeader.QTime} ms</span>
#else<span><span class="results-found">$page.results_found</span> results found in ${response.responseHeader.QTime} ms</span>
Page <span class="page-num">$page.current_page_number</span> of <span
class="page-count">$page.page_count</span>#end
</div>
<div class="results">
#if($response.response.get('grouped'))
#foreach($grouping in $response.response.get('grouped'))
#parse("hitGrouped.vm")
#end
#else
#foreach($doc in $response.results)
#parse("hit.vm")
#end
#end
</div>
<div class="pagination">
#if($response.response.get('grouped'))
#else
#link_to_previous_page("previous")
<span class="results-found">$page.results_found</span> results found.
Page <span class="page-num">$page.current_page_number</span> of <span
class="page-count">$page.page_count</span>
#link_to_next_page("next")
#end
<br/>
</div>

View File

@@ -1,26 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h2 #annTitle("Clusters generated by Carrot2 using the /clustering RequestHandler")>Clusters</h2>
<div id="clusters">
Run Solr with java -Dsolr.clustering.enabled=true -jar start.jar to see results
</div>
<script type="text/javascript">
$('#clusters').load("#url_for_solr/clustering#lens",
{'wt':'velocity', 'v.template':"clusterResults"});
</script>

View File

@@ -1,29 +0,0 @@
#foreach ($clusters in $response.response.clusters)
#set($labels = $clusters.get('labels'))
#set($docs = $clusters.get('docs'))
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h3>#foreach ($label in $labels)$label#if( $foreach.hasNext ),#end#end</h3>
<ol>
#foreach ($cluDoc in $docs)
<li><a href="#url_for_home?q=id:$cluDoc">$cluDoc</a></li>
#end
</ol>
#end

View File

@@ -1,42 +0,0 @@
<div class="result-title"><b>#field('name')</b><span class="mlt">#if($params.getBool('mlt', false) == false)<a href="#lensNoQ&q=id:$docId&mlt=true">More Like This</a>#end</span></div>
##do we have a physical store for this product
#set($store = $doc.getFieldValue('store'))
#if($store)<div class="map"><img src="http://maps.google.com/maps/api/staticmap?&zoom=12&size=150x80&maptype=roadmap&markers=$doc.getFieldValue('store')&sensor=false" /><div><small><a target="_map" href="http://maps.google.com/?q=$store&amp;source=embed">Larger Map</a></small></div></div>#end
<div>Price: $!number.currency($doc.getFieldValue('price'))</div>
<div>Features: #field('features')</div>
<div>In Stock: #field('inStock')</div>
<div class="mlt">
#set($mlt = $mltResults.get($docId))
#set($mltOn = $params.getBool('mlt'))
#if($mltOn == true)<div class="field-name">Similar Items</div>#end
#if ($mltOn && $mlt && $mlt.size() > 0)
<ul>
#foreach($mltHit in $mlt)
#set($mltId = $mltHit.getFieldValue('id'))
<li><div><a href="#url_for_home?q=id:$mltId">$mltId</a></div><div><span class="field-name">Name:</span> $mltHit.getFieldValue('name')</div>
<div><span class="field-name">Price:</span> $!number.currency($mltHit.getFieldValue('price')) <span class="field-name">In Stock:</span> $mltHit.getFieldValue('inStock')</div>
</li>
#end
</ul>
#elseif($mltOn && $mlt.size() == 0)
<div>No Similar Items Found</div>
#end
</div>
#if($params.getBool("debugQuery",false))
<a href="#" onclick='jQuery(this).siblings("pre").toggle(); return false;'>toggle explain</a>
<pre style="display:none">$response.getExplainMap().get($doc.getFirstValue('id'))</pre>
<a href="#" onclick='jQuery(this).siblings("pre2").toggle(); return false;'>toggle all fields</a>
<pre2 style="display:none">
#foreach($fieldname in $doc.fieldNames)
<br>
<span class="field-name">$fieldname :</span>
<span>
#foreach($value in $doc.getFieldValues($fieldname))
$value
#end
</span>
#end
</br>
</pre2>
#end

Some files were not shown because too many files have changed in this diff Show More