The Inarticulate| regex

茴字的 N 种写法：正则文本提取 sed vs awk vs grep \K

style-notepad len:1896 crease:61% color:1 rot:-2

任务： 从如下名为 info.txt 的多行文本中提取出 "1.2.3-beta"：

Project: SuperApp
Current version is v1.2.3-beta
Build date: 2021-07-03

sed -nE 's/.*version is v(.*)/\1/p' info.txt

awk '/version is v/ {print $4}' info.txt | sed 's/^v//'

grep -o 'version is v[0-9].*' info.txt | awk '{print $3}' | cut -c 2-

优点： grep -o 是提取正则匹配内容的标准工具。
缺点： 因为它连带捕获了作为锚点的前缀（version is v），所以你不得不把结果通过管道传给 awk 和 cut，只为了把不需要的字符清理掉。

grep -Po 'version is v\K.*' info.txt

优点： 一条命令实现完美的精准提取。\K 会匹配前缀内容，但在最终输出时自动将其丢弃。
缺点： 依赖 PCRE (-P)，这意味着在 macOS 上无法开箱即用（macOS 自带的 BSD grep 完全不支持 -P 参数）。
- macOS 解决方案 1: brew install grep（使用 ggrep -Po 运行）。
- macOS 解决方案 2: 直接使用 Perl：perl -nle 'print $& if m/version is v\K.*/' info.txt

grep -Po '(?<=version is v).*' info.txt

优点： 标准的正则表达式概念。
缺点： 标准的后行断言必须是固定长度的。你不能在里面使用 * 或 + 这种不定长的通配符（比如 (?<=version:\s*) 就会报错）。而 \K 则完美支持不定长的匹配，实用性远胜于此。

#regex #shell

任务： 从如下名为 info.txt 的多行文本中提取出 "1.2.3-beta"：

Project: SuperApp
Current version is v1.2.3-beta
Build date: 2021-07-03

sed -nE 's/.*version is v(.*)/\1/p' info.txt

awk '/version is v/ {print $4}' info.txt | sed 's/^v//'

grep -o 'version is v[0-9].*' info.txt | awk '{print $3}' | cut -c 2-

优点： grep -o 是提取正则匹配内容的标准工具。
缺点： 因为它连带捕获了作为锚点的前缀（version is v），所以你不得不把结果通过管道传给 awk 和 cut，只为了把不需要的字符清理掉。

grep -Po 'version is v\K.*' info.txt

优点： 一条命令实现完美的精准提取。\K 会匹配前缀内容，但在最终输出时自动将其丢弃。
缺点： 依赖 PCRE (-P)，这意味着在 macOS 上无法开箱即用（macOS 自带的 BSD grep 完全不支持 -P 参数）。
- macOS 解决方案 1: brew install grep（使用 ggrep -Po 运行）。
- macOS 解决方案 2: 直接使用 Perl：perl -nle 'print $& if m/version is v\K.*/' info.txt

grep -Po '(?<=version is v).*' info.txt

优点： 标准的正则表达式概念。
缺点： 标准的后行断言必须是固定长度的。你不能在里面使用 * 或 + 这种不定长的通配符（比如 (?<=version:\s*) 就会报错）。而 \K 则完美支持不定长的匹配，实用性远胜于此。

#regex #shell