Howard Lee Cloud

爬資料和過濾資料常常用到Regular expression(以下簡稱為RE)達成目標，其中常用到Anchors，所以決定要記錄如何使用Ruby語言的Regular expression anchor。Ruby Regular expression官方說明 http://ruby-doc.org/core-2.1.1/Regexp.html，要看Anchor相關資料請跳到Anchor段落，本篇只會提到其中的Lookahead和Lookbehind部分。

以下是自Ruby官網文件複製的資訊：

(?=pat) Positive lookahead assertion: ensures that the following characters match pat, but doesn’t include those characters in the matched text
(?!pat) Negative lookahead assertion: ensures that the following characters do not match pat, but doesn’t include those characters in the matched text
(?<=pat) Positive lookbehind assertion: ensures that the preceding characters match pat , but doesn’t include those characters in the matched text
(?<!pat) Negative lookbehind assertion: ensures that the preceding characters do not match pat, but doesn’t include those characters in the matched text

我常用第一種Positive lookahead assertion，它的意思是在找到符合的RE前提(我們稱之為RE1)下，從RE1的位置繼續往下找（若以英文慣例由左望右排列的字串，則是往右找）符合?=符號後面的RE內容，本例是pat，我們稱之為RE2。如果找到同時符合RE1和RE2的情況，RE entire = RE1+RE2，那麼Ruby語言就會告訴我們，他找到了RE entire情況，Ruby語言最後只會回傳成功比對RE1的結果，且回傳的RE1內容後面，在尋找的原文中一定會有RE2。RE1、RE2和RE entire是我為了行文流暢和幫助讀者理解而自創的名詞。Positive lookahead assertion常見於加在RE1的後面，成為RE2，若要加在RE1的前面也可以。值得一提的是，Lookahead的RE2內可以使用變動字元數量類型的RE，例如\d+($|.\d+)這種用以表示沒有帶正負符號的浮點數。你可能會嗤之以鼻地想：「RE不是本來就可以用變動字元數量類型嗎？」沒錯，在Ruby語言的Lookahead assertion可以用，但是本文後面會提到Look behind assertion的RE2，只能使用固定字元數量的RE，例如pat，或是POSIX規範的RE，像是沒有正負符號的浮點數[[:digit:]]。

第二種Negative lookahead assertion的用法大致與Positive lookahead assertion相同，不再贅述，但它的意義是相反的，目的是要回傳成功比對RE1的結果，且這些RE1在原文的內容後面一定不會跟著有RE2。

第三種Postive look behind assertion，它的意思是在找到符合的RE1前提下，從RE1的位置往上找（若以英文慣例由左望右排列的字串，則是往左找）符合?<=後面的RE內容，本例是pat。如果找到符合R1和R2的情況，RE entire = RE2 + RE1，Ruby語言會告訴我們它找到了RE entire情況，並且回傳成功比對RE1的結果，RE1在原文的內容前面一定會符合RE2 + RE1的情況。實際使用Ruby語言的Look behind我覺得不好用，因為

Ruby語言的Look behind不能使用變動字元數量類型的RE
Look ahead如果加在RE1，也可以找到類似的結果，而且Look ahead用法相對Look behind比較容易理解。

第四種Negative look behind assertion的用法大致與Postive look behind assertion相同，不再贅述，但它的意義是相反的，目的是要回傳成功比對RE1的結果，且這些RE1在原文的內容前面一定不會跟著有RE2。

我目前還沒試過Lookahead內使用look behind，以及look behind assertion內使用Lookahead，不清楚是不是有這兩種用法，想到就覺得複雜。另外，在實際處理的複雜例子中，我覺得RE仍有力有未逮的時候，可能是我還不知道該怎麼撰寫RE吧。改天寫一篇我遇到的情況，希望可以解決Conditional/Excludsion matching。

Ruby regular expression anchor about lookahead and lookbehind