请帮助我使用这个正则表达式,我需要每个第一个Meta Mapping的所有组件。
短语:。 \ nMeta Mapping *。* 这会是什么? 我今天刚开始学习正则表达式。
到目前为止我有这个,我有点卡住了。我有下面的文档,以及我想要的输出。主要文件:
Phrase: "is"
Phrase: "normal."
Meta Mapping (1000):
1000 % Normal (Mean Percent of Normal) [Quantitative Concept]
Meta Mapping (1000):
1000 Normal [Qualitative Concept]
Meta Mapping (1000):
1000 % normal (Percent normal) [Quantitative Concept]
Processing 00000000.tx.8: The EKG shows nonspecific changes.
Phrase: "The EKG"
Meta Mapping (1000):
1000 EKG (Electrocardiogram) [Finding]
Meta Mapping (1000):
1000 EKG (Electrocardiography) [Diagnostic Procedure]
Phrase: "shows"
Meta Mapping (1000):
1000 Show [Intellectual Product]
Phrase: "nonspecific changes."
Meta Mapping (901):
694 Nonspecific [Idea or Concept]
861 changes (Changed status) [Quantitative Concept]
Meta Mapping (901):
694 Nonspecific [Idea or Concept]
861 changes (Changing) [Functional Concept]
Meta Mapping (901):
694 Non-specific (Unspecified) [Qualitative Concept]
861 changes (Changed status) [Quantitative Concept]
Meta Mapping (901):
694 Non-specific (Unspecified) [Qualitative Concept]
861 changes (Changing) [Functional Concept]
我希望结果每个短语只有一个元映射。
所以
Phrase: "normal."
Meta Mapping (1000):
1000 % Normal (Mean Percent of Normal) [Quantitative Concept]
Phrase: "The EKG"
Meta Mapping (1000):
1000 EKG (Electrocardiogram) [Finding]
Phrase: "shows"
Meta Mapping (1000):
1000 Show [Intellectual Product]
Phrase: "nonspecific changes."
Meta Mapping (901):
694 Nonspecific [Idea or Concept]
861 changes (Changed status) [Quantitative Concept]
请帮我这个正则表达式,我需要每个第一个Meta Mapping的所有组件。谢谢!
答案 0 :(得分:2)
我认为这可能对你有用。只是重新,与awk无关。在此测试regex101.com/
Phrase.*\nMeta.*\n^((?![Meta|\n]).*\n)*
gnu awk版本:
cat your_data_file | awk '
BEGIN {
FS="\n"
RS="\n\n"
OFS="\n"
}
NF > 1 {
print $1, $2
for (i = 3; i <= NF; i++)
if (match($i, "Meta Mapping")) {
print ""
next
}
else
print $i
print ""
}
'
答案 1 :(得分:0)
带注释,符合POSIX的awk
解决方案:
awk -v RS='' -F'\n' -v re='^Meta Mapping \\(' '
# Only process non-empty records:
# those that have at least 1 "Meta Mapping" line.
$2 ~ re {
print $1 # print the "Phrase: " line
print $2 # print the 1st "Meta Mapping" line.
# Print the remaining lines, if any, up to but not including
# the next "Meta Mapping" line.
for (i=3;i<=NF;++i) {
if ($i ~ re) break # next "Meta Mapping" found; ignore and terminate block.
print $i
}
print "" # print empty line between output blocks
}
' file
RS=''
是一种awk
成语,它通过空行将行分为记录:换句话说:每次运行非空行形成一条记录。-F'\n'
通过行将每条记录分成字段;即,$1
指的是每条记录中的第1行,$2
指的是第2行,......; NF
包含当前记录中的行数(字段)。re='...'
定义了一个awk
变量,其中包含一个标识每条记录中Meta Mapping
行的正则表达式。