mod_estraier is an apache module that registers web pages processed by the apache and search from them using the node API of Hyper Estraier. Especially, indexing and searching the documents through the proxy or dynamic contents like Wiki or BBS is the main object of mod_estraier.
See below URL for Hyper Estraier:
http://hyperestraier.sourceforge.net/
The project page of mod_estraier in sf.net is the following:
http://sourceforge.net/projects/modestraier/
mod_estraier is distributed under GPL2.
mod_estraier is tested in the following environments:
Extract the archive of mod_estraier, and do
$ ./configure $ make # make install
You can use --with-tidy configure option to set the location of tidy, and you can use --with-apxs configure option to set the location of apxs.
Next, you should configure the apache. For example, if you want to use mod_estraier as proxy, you may add to httpd.conf like following:
LoadModule estraier_module modules/mod_estraier.so ProxyRequests On <Proxy *> Order deny,allow Deny from all Allow from 127.0.0.1 SetOutputFilter estraier EstraierNode http://localhost:1978/node/test EstraierUser admin EstraierPass admin EstraierDenyURI http://[a-z]*.?google.co EstraierAllowURI http://labs.google.com/ EstraierDenyURI favicon.ico EstraierUseWeight On EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml EstraierFilterCommand ^application/msword H@/usr/local/share/hyperestraier/filter/estfxmsotohtml EstraierFilterCommand ^application/vnd.ms-(excel|powerpoint) H@/usr/local/share/hyperestraier/filter/estfxmsotohtml </Proxy>
or, if you want to use as reverse proxy, you may add to httpd.conf like following:
<Location /my_web/> SetOutputFilter estraier EstraierNode http://localhost:1978/node/test EstraierUser admin EstraierPass admin EstraierDenyURI favicon.ico EstraierUseWeight On </Location>
The detail of the each options is described in the below section. There are other samples doc/recipes.conf in package.
Then you should restart the apache. You may do like the following:
If you have no DB, you may run
estmaster init casket
to initialize the DB. After the initialization, you do
estmaster start casket
to execute estmaster. In the above configuration case, you need the node named "test", and you should create the node with http://localhost:1978/master_ui
Set the proxy host of your browser to "localhost" and the port to "80". And you browse web sites as usually, your DB becomes larger.
You can search the DB with the node API clients. For example, you can use the search interface of the node-master with the following URL:
http://localhost:1978/node/test/search_ui
You may use some options in httpd.conf for mod_estraier settings.
EstraierNode directive specifies the node-server and the node of Hyper Estraier. If you don't set it, mod_estraier will not work.
Specifies the username and the password of node-server. If you don't set them, mod_estraier will not work.
Specifies the HTTP proxy used when mod_estraier accesses the node-server. If you don't set them, mod_estraier uses no proxy.
Specifies the timeout for accessing the node-server. The default value is 5 seconds.
Specifies the URI that mod_estraier doesn't register. If you want mod_estraier not to register google, you may specify like the following:
EstraierDenyURI http://[a-z]*.?google.co
You can use this option several times.
Specifies the URI that mod_estraier register. If you want mod_estraier to register only google, you may specify like the following:
EstraierDenyURI .* EstraierAllowURI http://[a-z]*.?google.co
You can use this option several times.
The effect of the letterer EstraierDenyURI and EstraierAllowURI is valid.
Specifies the language that Hyper Estraier use when registering. If you are Japanese, specify the following:
EstraierLanguage ja
You can choose en, ja, zh, ko, misc. And the default is en.
If you specifies the directive On, mod_estraier registers in detached thread. The response speed may be faster with this option. The default value is Off.
Specifies the value of request header you don't want to register. For example, with
EstraierDenyRequestHeader Authorization .*
setting, mod_estraier doesn't register authorized place. The condition of the header and the header-value is described by regular expression.
Specifies the value of response header you don't want to register.
If you specifies the directive On, mod_estraier adds score weight info to the URLs viewed more than once. The default value is Off.
Specifies the filter command for specified Content-Type. For example, if you specify like the following:
EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml
mod_estraier use estfxpdftohtml to convert pdf to html. Use H@ prefix when filtered file type is html, T@ when text, and no prefix when Hyper Estraier's document draft.
Specifies the temporary directory for the above filter commands. An typical example is the following:
EstraierFilterTmpdir /tmp
Specifies max size of the document mod_estraier processes. The default value is 10000000, that is, 10MB.
mod_estraier_search is additional module with which you can search Hyper Estraier DB.
The configuration example is the following:
LoadModule estraier_search_module modules/mod_estraier_search.so <Location /moe/> SetHandler estraier_search EstsearchNode http://localhost:1978/node/test EstsearchUser admin EstsearchPass admin EstsearchTimeout 5 EstsearchNodeDepth 0 EstsearchTemplateHead /home/i/wrk/mod_estraier/tmpl/estseek.head EstsearchTemplateFoot /home/i/wrk/mod_estraier/tmpl/estseek.foot </Location>
You should rewrite EstsearchTemplate* to right place.
With this configuration, You can use search engine accessing http://localhost/moe/
In this search engine, you can use google-like syntax for search word. You can do or-search with "OR" or "|". You can use "-" started search word to exclude the word from results. You can use link:URI syntax to select the documents that include the link to URI. You can use site:URI syntax to select the documents whose URI include specified URI.
mod_estraier_cache is experimental module. mod_estraier_cache generates document from the DB of Hyper Estraier.
The configuration example is the following:
LoadModule estraier_cache_module modules/mod_estraier_cache.so Listen *:8081 <VirtualHost *:8081> SetHandler estraier_cache EstcacheNode http://localhost:1978/node/web EstcacheUser admin EstcachePass admin </VirtualHost>
And you set proxy and browse.