我正在处理一个乳胶文件,我需要从中挑选出由\ citep {}标记的引用。这就是我用sed做的事情。
cat file.tex | grep citep | sed 's/.*citep{\(.*\)}.*/\1/g'
现在,如果一行中只有一个模式,则此工作正常。如果一行中有多个模式,即\ citep,则失败。即使只有一个模式但是有一个以上的结束括号},它也会失败。我应该怎么做,以便它适用于一行中的所有模式以及我正在寻找的专属括号?
我正在打击bash。该文件的一部分如下所示:
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
在一行上,我得到这样的回答
BilhamE01, TapponnierM76} and by distributed seismicity across the region (Fig. \ref{fig1_2
而我正在寻找
BilhamE01, TapponnierM76
另一个带有多个/ citep模式的例子给出了像这样的输出
Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
而我正在寻找
Pauletal2015 Mitraetal2005
有人可以帮忙吗?
答案 0 :(得分:3)
这是一个贪婪的匹配改变正则表达式匹配第一个右大括号
if (null == myvariable)
测试
.*citep{\([^}]*\)}
请注意,它只会匹配每行一个实例。
答案 1 :(得分:2)
如果你正在使用grep
,你也可以坚持下去(假设GNU grep
):
$ echo $str | grep -oP '(?<=\\citep{)[^}]+(?=})'
BilhamE01, TapponierM76
答案 2 :(得分:1)
对于它的价值,这个可以用sed
来完成:
echo "\citep{string} xyz {abc} \citep{string2},foo" | \
sed 's/\\citep{\([^}]*\)}/\n\1\n\n/g; s/^[^\n]*\n//; s/\n\n[^\n]*\n/, /g; s/\n.*//g'
输出:
string, string2
但哇,那太丑了。 sed
脚本在此表单中更容易理解,恰好可以通过sed
参数提供给-f
:
# change every \citep{string} to <newline>string<newline><newline>
s/\\citep{\([^}]*\)}/\n\1\n\n/g
# remove any leading text before the first wanted string
s/^[^\n]*\n//
# replace text between wanted strings with comma + space
s/\n\n[^\n]*\n/, /g
# remove any trailing unwanted text
s/\n.*//
这利用了sed
可以匹配并替换换行符的事实,即使读取新的输入行不会导致最初出现在模式空间中的换行符。换行符是我们可以确定的一个字符,只有当sed
故意将它放在那里时才会出现在模式空间(或保留空间)中。
初始替换纯粹是为了通过简化目标分隔符来使问题易于管理。原则上,剩余的步骤可以在没有这种简化的情况下执行,但所涉及的正则表达式将是可怕的。
这假设每个string
中的\citep{string}
至少包含一个字符;如果必须容纳空字符串,那么这种方法需要更多细化。
当然,我无法想象为什么有人会更喜欢@Lev的直接grep
方法,但这个问题确实专门针对sed
解决方案。
答案 3 :(得分:0)
<强> f.awk 强>
BEGIN {
pat = "\\citep"
latex_tok = "\\\\[A-Za-z_][A-Za-z_]*" # match \aBcD
}
{
f = f $0 # store content of input file as a sting
}
function store(args, n, k, i) { # store `keys' in `d'
gsub("[ \t]", "", args) # remove spaces
n = split(args, keys, ",")
for (i=1; i<=n; i++) {
k = keys[i]
d[k]
}
}
function ntok() { # next token
if (match(f, latex_tok)) {
tok = substr(f, RSTART ,RLENGTH)
f = substr(f, RSTART+RLENGTH-1 )
return 1
}
return 0
}
function parse( i, rc, args) {
for (;;) { # infinite loop
while ( (rc = ntok()) && tok != pat ) ;
if (!rc) return
i = index(f, "{")
if (!i) return # see `pat' but no '{'
f = substr(f, i+1)
i = index(f, "}")
if (!i) return # unmatched '}'
# extract `args' from \citep{`args'}
args = substr(f, 1, i-1)
store(args)
}
}
END {
parse()
for (k in d)
print k
}
<强> f.example 强>
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
用法:
awk -f f.awk f.example
预期的输出:
BendickB2001
Arunetal2010
Pauletal2015
Mitraetal2005
BilhamE01
Mukuletal2009
TapponnierM76
WangLiu2009
BhattacharyaM2009
Mitraetal2010
Actonetal2011
Vernantetal2014