我有一个包含许多数字的文件,用10个前导数字写入,之前暂时放置“A”,之后放置“Z”,以确保脚本不会错误识别数字的开头和结尾。 E.g:
A00000000001Z
A00000000003Z,A00000000004Z;A00000000005Z
A00000000004Z A00000000005Zsome wordsA00000000001Z
A00000000006Z;A00000000005Z
A00000000001Z
我需要搜索特定的数字,但只输出那些找到数字的行,但是之前从未出现的其他数字都不在同一行。
例如,如果我搜索“0000000001”,它将打印第1,3和5行:
A00000000001Z
A00000000004Z A00000000005Zsome wordsA00000000001Z
A00000000001Z
它可以打印第3行,因为其他数字“00000000004”和“00000000005”之前出现在第2行。
如果我搜索“00000000005”,则会打印第3行:
A00000000004Z A00000000005Zsome wordsA00000000001Z
它不会打印第2行,因为之前从未出现过其他数字“00000000003”和“00000000004”。
到目前为止,我已经解决了这个问题:
# search for the line and print the previously appearing lines to a temporary file
grep -B 10000000 0000000001 file.txt > output.temp
# send the last line to another file
cat output.temp | tail -1 > output.temp1
sed -i '$ d' output.tmp > output.temp2
# search for numbers appearing in output.temp2
for i in 1 .. 1000000 NOT original number
a=`printf $010d $i`
if [ $a FOUND in output.temp2]
then
# check if was found in the previous line
if [ $a NOT FOUND in output.temp1]
else
fi
fi
done < ./file.txt
如何只打印出包含特定数字的那些行,同时排除以前从未出现在文件中的其他数字?
答案 0 :(得分:1)
不是严格意义上的bash,但是在Python2中你可以从shell运行:
#!/usr/bin/env python
import re
import sys
def find_valid_ids(input_file, target_id):
with open(input_file) as f:
found_ids = set()
for line in f.readlines():
ids = set(re.findall(r'A\d+Z', line))
if (target_id in ids and
(len(ids - found_ids) == 0 or
(len(ids) == 1 and target_id in ids))):
print line.strip('\n')
found_ids |= ids
if __name__ == "__main__":
try:
find_valid_ids(sys.argv[1], sys.argv[2])
except IndexError as e:
print 'Usage: ./find_valid_ids.py input_file target_id'
因此,如果您将上述内容保存为find_valid_ids.py
,则$ chmod +x find_valid_ids.py
并将其设置为$ ./find_valid_ids.py your_input_file.txt A00000000001Z