mod_estraier

japanese page.

Abstract

mod_estraier is an apache module that registers web pages processed by the apache and search from them using the node API of Hyper Estraier. Especially, indexing and searching the documents through the proxy or dynamic contents like Wiki or BBS is the main object of mod_estraier.

See below URL for Hyper Estraier:

http://hyperestraier.sourceforge.net/

The project page of mod_estraier in sf.net is the following:

http://sourceforge.net/projects/modestraier/

mod_estraier is distributed under GPL2.

mod_estraier-0.3.2.tar.gz

Environment

mod_estraier is tested in the following environments:

How to run

Compile

Extract the archive of mod_estraier, and do

$ ./configure
$ make
# make install

You can use --with-tidy configure option to set the location of tidy, and you can use --with-apxs configure option to set the location of apxs.

Configuration

Next, you should configure the apache. For example, if you want to use mod_estraier as proxy, you may add to httpd.conf like following:

LoadModule estraier_module modules/mod_estraier.so

ProxyRequests On
<Proxy *>
 Order deny,allow
 Deny from all
 Allow from 127.0.0.1
 SetOutputFilter estraier
 EstraierNode http://localhost:1978/node/test
 EstraierUser admin
 EstraierPass admin
 EstraierDenyURI http://[a-z]*.?google.co
 EstraierAllowURI http://labs.google.com/
 EstraierDenyURI favicon.ico
 EstraierUseWeight On
 EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml
 EstraierFilterCommand ^application/msword H@/usr/local/share/hyperestraier/filter/estfxmsotohtml
 EstraierFilterCommand ^application/vnd.ms-(excel|powerpoint) H@/usr/local/share/hyperestraier/filter/estfxmsotohtml
</Proxy>

or, if you want to use as reverse proxy, you may add to httpd.conf like following:

<Location /my_web/>
 SetOutputFilter estraier
 EstraierNode http://localhost:1978/node/test
 EstraierUser admin
 EstraierPass admin
 EstraierDenyURI favicon.ico
 EstraierUseWeight On
</Location>

The detail of the each options is described in the below section. There are other samples doc/recipes.conf in package.

Restarting of Apache

Then you should restart the apache. You may do like the following:

Execution of estmaster

If you have no DB, you may run

estmaster init casket

to initialize the DB. After the initialization, you do

estmaster start casket

to execute estmaster. In the above configuration case, you need the node named "test", and you should create the node with http://localhost:1978/master_ui

Run

Set the proxy host of your browser to "localhost" and the port to "80". And you browse web sites as usually, your DB becomes larger.

You can search the DB with the node API clients. For example, you can use the search interface of the node-master with the following URL:

http://localhost:1978/node/test/search_ui

Options

You may use some options in httpd.conf for mod_estraier settings.

EstraierNode

EstraierNode directive specifies the node-server and the node of Hyper Estraier. If you don't set it, mod_estraier will not work.

EstraierUser, EstraierPass

Specifies the username and the password of node-server. If you don't set them, mod_estraier will not work.

EstraierProxyHost, EstraierProxyPort

Specifies the HTTP proxy used when mod_estraier accesses the node-server. If you don't set them, mod_estraier uses no proxy.

EstraierTimeout

Specifies the timeout for accessing the node-server. The default value is 5 seconds.

EstraierDenyURI

Specifies the URI that mod_estraier doesn't register. If you want mod_estraier not to register google, you may specify like the following:

EstraierDenyURI http://[a-z]*.?google.co

You can use this option several times.

EstraierAllowURI

Specifies the URI that mod_estraier register. If you want mod_estraier to register only google, you may specify like the following:

EstraierDenyURI .*
EstraierAllowURI http://[a-z]*.?google.co

You can use this option several times.

The effect of the letterer EstraierDenyURI and EstraierAllowURI is valid.

EstraierLanguage

Specifies the language that Hyper Estraier use when registering. If you are Japanese, specify the following:

EstraierLanguage ja

You can choose en, ja, zh, ko, misc. And the default is en.

EstraierDetachThread

If you specifies the directive On, mod_estraier registers in detached thread. The response speed may be faster with this option. The default value is Off.

EstraierDenyRequestHeader

Specifies the value of request header you don't want to register. For example, with

EstraierDenyRequestHeader Authorization .*

setting, mod_estraier doesn't register authorized place. The condition of the header and the header-value is described by regular expression.

EstraierDenyResponseHeader

Specifies the value of response header you don't want to register.

EstraierUseWeight

If you specifies the directive On, mod_estraier adds score weight info to the URLs viewed more than once. The default value is Off.

EstraierFilterCommand

Specifies the filter command for specified Content-Type. For example, if you specify like the following:

EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml

mod_estraier use estfxpdftohtml to convert pdf to html. Use H@ prefix when filtered file type is html, T@ when text, and no prefix when Hyper Estraier's document draft.

EstraierFilterTmpdir

Specifies the temporary directory for the above filter commands. An typical example is the following:

EstraierFilterTmpdir /tmp

EstraierDocumentSizeLimit

Specifies max size of the document mod_estraier processes. The default value is 10000000, that is, 10MB.

mod_estraier_search

mod_estraier_search is additional module with which you can search Hyper Estraier DB.

The configuration example is the following:

LoadModule estraier_search_module modules/mod_estraier_search.so

<Location /moe/>
 SetHandler estraier_search
 EstsearchNode http://localhost:1978/node/test
 EstsearchUser admin
 EstsearchPass admin
 EstsearchTimeout 5
 EstsearchNodeDepth 0
 EstsearchTemplateHead /home/i/wrk/mod_estraier/tmpl/estseek.head
 EstsearchTemplateFoot /home/i/wrk/mod_estraier/tmpl/estseek.foot
</Location>

You should rewrite EstsearchTemplate* to right place.

With this configuration, You can use search engine accessing http://localhost/moe/

In this search engine, you can use google-like syntax for search word. You can do or-search with "OR" or "|". You can use "-" started search word to exclude the word from results. You can use link:URI syntax to select the documents that include the link to URI. You can use site:URI syntax to select the documents whose URI include specified URI.

mod_estraier_cache

mod_estraier_cache is experimental module. mod_estraier_cache generates document from the DB of Hyper Estraier.

The configuration example is the following:

LoadModule estraier_cache_module modules/mod_estraier_cache.so

Listen *:8081
<VirtualHost *:8081>
  SetHandler estraier_cache
  EstcacheNode http://localhost:1978/node/web
  EstcacheUser admin
  EstcachePass admin
</VirtualHost>

And you set proxy and browse.