I use the gem pg_search
in rails as a light-weight tool for full-text searching task of Postgres, but Postgres doesn’t support the function well in Mandarin/Chinese. After finding solutions, I take the combination of tools for full-text search in Postgres, which are pg_search
gem, SCWS
and zhparser
. Doing it yourself according content in following links, you will setup successfully too.
Remark at 20160131: This tutorial requires postgresql-server-devel-{VERSION} at Ubuntu and Mac because it will use pgxs function.
因為Postgres的全文搜尋功能目前在中文/日語/越南文方面仍然無法原生支援,所以必須借助第三方軟體的功能才能使用全文搜尋。這次我在Rails環境使用 pg_search
gem, SCWS
軟體和 zhparser
軟體,成功達成中文全文搜尋的功能,依照以下連結的內容操作,你也能讓Postgres進行中文全文搜尋。
提醒:請記得安裝postgresql-server-devel-{數字},因為會用到pgxs的功能。
Steps:
Similar contents as aboved http://www.rails365.net/articles/2015-09-30-postgresql-de-quan-wen-jian-suo-xi-tong-zhi-zhong-wen-zhi-chi-san
wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
tar xvjf scws-1.2.2.tar.bz2
cd scws-1.2.2
./configure --prefix=/usr/local/scws
make
make install`
To check whether it installed successfully, please enter below command:
`ls -al /usr/local/scws/lib/libscws.la`
git clone https://github.com/amutu/zhparser.git
cd zhparser
SCWS_HOME=/usr/local/scws/include make && make install
20160131 Update:
If you use Mac OS Yosemite, please use SCWS_HOME value to /usr/local/include/scws/ . If you use Ubuntu 14.04 LTS, you should change SCWS_HOME to /usr/local/scws . 如果使用MacOS Yosemite,SCWS_HOME值設定成 /usr/local/scws/include。如果使用 Ubuntu 14.04 LTS,則 SCWS_HOME 改成使用 /usr/local/scws 。
Login your postgres database through terminal/commandline
psql yourdatabasename
Create extension in Postgres. You could specify what dictionary name you want.
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION dictionarynameyouwant (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION dictionarynameyouwant ADD MAPPING FOR n,v,a,i,e,l WITH simple;
If you follow above steps, you can use the function of Postgres full-text searching in Chinese/Mandarin words.
pg_search
gem use extension created from 1st link francs3.blog.163.com http://www.rails365.net/articles/2015-09-30-postgresql-quan-wen-jian-suo-xi-tong-pg-search-shi-xian-xian-siExtra step(not necessary) in Rails for using pg_search gem. Configure the dictionary name at :dictionary attribute of :tsearch in app/models/yourmodel.rb
class YourOwnClass < ActiveRecord::Base
...
include PgSearch
pg_search_scope :functionnameyoulike, :against => [columnsyoulike1, columnsyoulike2, ...,etc], :using => { :tsearch => {:dictionary => "dictionary name you just specified in creating a extension in postgres", blah blah blah, ..., etc} }
end
Concepts you must have to know :
A. SCWS - Simple Chinese Word Segmentation http://www.xunsearch.com/scws/
B. Postgres - Create extension http://www.postgresql.org/docs/9.4/static/sql-createextension.html
C. Postgres - Create text search configuration http://www.postgresql.org/docs/9.4/static/sql-createtsconfig.html
D. Postgres - Extend extension http://www.postgresql.org/docs/9.4/static/extend-extensions.html
E. Setting up full text searching in other languages http://shisaa.jp/postset/postgresql-full-text-search-part-2.html
F. Other Mandarin/Chinese Thesaurus http://www.oschina.net/project/tag/264/segment?lang=0&os=0&sort=view&p=1
By the way, you can find dictionaries for simplified and tradictional Chinese in gbk and utf8 encoding at download page on SCWS website. That page also has convertion tools(written in php files) between xdb-based and txt dictionaries.
Related material at this blog: