我有一组目录路径名,看起来像这样:
$tag_key
或者像这样,最后加上一个斜杠:
foo/bar/baz
此外,目录路径可以任意深度 - 不能保证只有3级,如我在此处所示;可能会有更多的水平。
我想编写一个正则表达式来捕获最右边的子目录名称,无论这两个表单中的哪一个出现。
我可以为grep写一个正则表达式foo/bar/baz/
,它适用于第一种情况:
'[^/]*$'
我如何扩展这个以包含第二个案例?看起来我需要在右侧捕获0或更多斜杠(即,在“$”旁边),但随后将它们扔掉,并且只匹配左边的东西。但我无法弄清楚正确的语法。
答案 0 :(得分:1)
此外,目录路径可以任意深度 - 不能保证只有3级,如我在此处所示;可能会有更多的水平。
您可以使用此awk
:
awk -F/ '{sub(/\/$/, ""); print $NF}' <<< "foo/bar/baz"
baz
awk -F/ '{sub(/\/$/, ""); print $NF}' <<< "foo/bar/baz/"
baz
awk -F/ '{sub(/\/$/, ""); print $NF}' <<< "abc/xyz/foo/bar/baz/"
baz
答案 1 :(得分:1)
您可以在EOL锚点library(dplyr)
library(fuzzyjoin)
Path <-
data.frame(
PathDate = as.Date(c("1993-12-22", "1994-05-16", "1992-07-20", "1996-06-02", "1992-04-20", "1996-08-30", "1992-01-26", "1991-03-23", "1995-12-28", "1995-07-15", "1993-04-04", "1994-01-11", "1999-08-21", "1993-11-10", "1994-02-26", "1992-08-06", "1993-06-29", "1997-03-08", "1998-03-03", "1998-04-17")),
PathHospNum = c("H432243", "T662272", "G424284", "W787634", "H432243", "Y980037", "H432243", "W787634", "Y980037", "E432243", "U874287", "Y980037", "U874287", "W787634", "Y980037", "H432243", "Y980037", "E432243", "W787634", "W787634"),
PathRep = replicate(20, paste(sample(LETTERS, 10, replace = T), collapse = ""))
)
Endo <-
data.frame(
EndoDate = as.Date(c("1993-12-22", "1994-05-14", "1992-07-19", "1996-06-01", "1992-04-20", "1996-08-30", "1992-01-24", "1991-03-21", "1995-12-28", "1995-07-15", "1993-04-02", "1994-01-10", "1999-08-21", "1993-11-10", "1994-02-26", "1992-08-05", "1993-06-29", "1997-03-07", "1998-03-03", "1998-04-17")),
EndoHospNum = c("H432243", "T662272", "G424284", "W787634", "H432243", "Y980037", "H432243", "W787634", "Y980037", "E432243", "U874287", "Y980037", "U874287", "W787634", "Y980037", "H432243", "Y980037", "E432243", "W787634", "W787634"),
EndoRep = replicate(20, paste(sample(LETTERS, 10, replace = T), collapse = ""))
)
Path$date <- as.numeric(Path$PathDate)
Endo$date <- as.numeric(Endo$EndoDate)
Result <-
fuzzyjoin::difference_full_join(Endo, Path, by = 'date', max_dist = 2, distance_col = 'Days') %>%
filter(EndoHospNum == PathHospNum) %>%
select(HospNum = EndoHospNum, EndoDate, PathDate, Days, EndoRep, PathRep)
Result
HospNum EndoDate PathDate Days EndoRep PathRep
1 H432243 1993-12-22 1993-12-22 0 YBGDMGMZOJ HISSGSCRFR
2 T662272 1994-05-14 1994-05-16 2 ANAPSCKUEB HIDIFHBDBL
3 G424284 1992-07-19 1992-07-20 1 HKOCQZAXDU PLXGUPHQBM
4 W787634 1996-06-01 1996-06-02 1 OTPLUZBLAF KGVILKEHLI
5 H432243 1992-04-20 1992-04-20 0 GRWJUQPNET UGCKMNKDLW
6 Y980037 1996-08-30 1996-08-30 0 ORUVMMGGAV EYOWEYAZFK
7 H432243 1992-01-24 1992-01-26 2 JVSPGIVXEM LCNXQNVGGR
8 W787634 1991-03-21 1991-03-23 2 WXZNHJIBZW OTXKNTYNKV
9 Y980037 1995-12-28 1995-12-28 0 PQQQLKTYPG UAMMKJZRFG
10 E432243 1995-07-15 1995-07-15 0 VYLDWUNAFP EXNTQSYVJM
11 U874287 1993-04-02 1993-04-04 2 MTBBBVULOD CTKXUKEOQG
12 Y980037 1994-01-10 1994-01-11 1 TRZWBYAUZR XIHXMOEFVP
13 U874287 1999-08-21 1999-08-21 0 DYBUWJIAZB KFFGYNQUYM
14 W787634 1993-11-10 1993-11-10 0 CBWRBZAPAF KYUOZSLIGF
15 Y980037 1994-02-26 1994-02-26 0 GDUFEYZQFU BUSFQIJDHK
16 H432243 1992-08-05 1992-08-06 1 JMGSCWTHOI ZRCJFDFNCX
17 Y980037 1993-06-29 1993-06-29 0 HDTGHCMORL EQYWNJHOET
18 E432243 1997-03-07 1997-03-08 1 WIMMVJHDSE LYLDELIBYK
19 W787634 1998-03-03 1998-03-03 0 GRHBDHEWJF AWDYEQZZWY
20 W787634 1998-04-17 1998-04-17 0 AOFIXWLZDT BBUEROUIWO
/?
$
https://regex101.com/r/mHzLx0/1
解释
/[^\/]+(?=\/?$)/
答案 2 :(得分:1)
将我的评论“转换”为答案:
sed
sed -E 's@.*/([^/]+).*@\1@'
-E
(或-r
,取决于操作系统)启用POSIX ERE语法。
模式详情:
.*
- 尽可能多的0个字符,直到后续子图案的最后一次出现/
- /
符号([^/]+)
- 第1组:/
.*
- 尽可能多的0个字符,直到一行。替换部件中的\1
将结果复制回存储在第1组内存缓冲区中的内容。
grep
如果您可以访问PCRE驱动的grep
(例如GNU grep
),您可以使用
grep -oP '[^/]+(?=/?$)'
其中-o
选项启用每个匹配的提取(而不是找到匹配的行)和-P
强制grep
使用PCRE正则表达式引擎来解析模式。它启用环视功能。 Lookarounds是非消费模式,即它们匹配的文本不会添加到匹配值中,也不会提升正则表达式索引,因此,它们有助于检查正则表达式中的各种条件。
模式详情:
[^/]+
- 一个否定括号表达式,匹配任何字符,但/
,1次或多次,最多(?=/?$)
- 可选的/
(?
量词匹配1或0次出现)在行尾($
)。