preloader
軟體工程

Make Postgres support full text search in Mandarin/Chinese for Rails 讓Postgres 開始支援中文全文檢索

I use the gem pg_search in rails as a light-weight tool for full-text searching task of Postgres, but Postgres doesn’t support the function well in Mandarin/Chinese. After finding solutions, I take the combination of tools for full-text search in Postgres, which are pg_search gem, SCWS and zhparser. Doing it yourself according content in following links, you will setup successfully too. 

Remark at 20160131: This tutorial requires postgresql-server-devel-{VERSION} at Ubuntu and Mac because it will use pgxs function.

因為Postgres的全文搜尋功能目前在中文/日語/越南文方面仍然無法原生支援,所以必須借助第三方軟體的功能才能使用全文搜尋。這次我在Rails環境使用 pg_search gem, SCWS軟體和 zhparser 軟體,成功達成中文全文搜尋的功能,依照以下連結的內容操作,你也能讓Postgres進行中文全文搜尋。

提醒:請記得安裝postgresql-server-devel-{數字},因為會用到pgxs的功能。

Steps:

  1. Detailed manual for setting up SCWS and zhparser http://francs3.blog.163.com/blog/static/405767272015065565069/

Similar contents as aboved http://www.rails365.net/articles/2015-09-30-postgresql-de-quan-wen-jian-suo-xi-tong-zhi-zhong-wen-zhi-chi-san

  • Install SCWS
wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
tar xvjf scws-1.2.2.tar.bz2
cd scws-1.2.2
./configure --prefix=/usr/local/scws 
make
make install`

 

To check whether it installed successfully, please enter below command:

`ls -al /usr/local/scws/lib/libscws.la`

 

  • Install Zhparser
git clone https://github.com/amutu/zhparser.git
cd zhparser
SCWS_HOME=/usr/local/scws/include make && make install

20160131 Update:

If you use Mac OS Yosemite, please use SCWS_HOME value to /usr/local/include/scws/ . If you use Ubuntu 14.04 LTS, you should change SCWS_HOME to /usr/local/scws .   如果使用MacOS Yosemite,SCWS_HOME值設定成 /usr/local/scws/include。如果使用 Ubuntu 14.04 LTS,則 SCWS_HOME 改成使用 /usr/local/scws 。

 

  1. Configure a new extension using zhparser in Postres 
  • Login your postgres database through terminal/commandline

    psql yourdatabasename

  • Create extension in Postgres. You could specify what dictionary name you want.

CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION dictionarynameyouwant (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION dictionarynameyouwant ADD MAPPING FOR n,v,a,i,e,l WITH simple;

If you follow above steps, you can use the function of Postgres full-text searching in Chinese/Mandarin words.

 

  1. pg_search gem use extension created from 1st link francs3.blog.163.com http://www.rails365.net/articles/2015-09-30-postgresql-quan-wen-jian-suo-xi-tong-pg-search-shi-xian-xian-si

Extra step(not necessary) in Rails for using pg_search gem. Configure the dictionary name at :dictionary attribute of :tsearch in app/models/yourmodel.rb

class YourOwnClass < ActiveRecord::Base
    ...
    include PgSearch
    pg_search_scope :functionnameyoulike, :against => [columnsyoulike1, columnsyoulike2, ...,etc], :using => { :tsearch => {:dictionary => "dictionary name you just specified in creating a extension in postgres", blah blah blah, ..., etc} }
end

 

Concepts you must have to know :

A. SCWS - Simple Chinese Word Segmentation http://www.xunsearch.com/scws/

B. Postgres - Create extension http://www.postgresql.org/docs/9.4/static/sql-createextension.html

C. Postgres - Create text search configuration http://www.postgresql.org/docs/9.4/static/sql-createtsconfig.html

D. Postgres - Extend extension http://www.postgresql.org/docs/9.4/static/extend-extensions.html

E. Setting up full text searching in other languages http://shisaa.jp/postset/postgresql-full-text-search-part-2.html

F. Other Mandarin/Chinese Thesaurus http://www.oschina.net/project/tag/264/segment?lang=0&os=0&sort=view&p=1

 

By the way, you can find dictionaries for simplified and tradictional Chinese in gbk and utf8 encoding at download page on SCWS website. That page also has convertion tools(written in php files) between xdb-based and txt dictionaries. 

 

Related material at this blog:

  1. https://howardlee.cloud/blog/122
  2. https://howardlee.cloud/blog/111 (written in chinese)