我需要删除包含read (symbol)
匹配项的文件中的所有行,其中(symbol)
是任何CJK字符。在匹配中,read (symbol)
紧接着是A-Z或a-z,但是,不应删除该行。例如,以下是一些示例行和结果:
Do you like to read books? (not deleted)
Can you read 书? ( deleted)
.read 书. (deleted)
This is some thread 线. (not deleted)
如何仅删除与(not A-Z or a-z)read (CJK symbol)
匹配的行?
答案 0 :(得分:1)
awk '$0~/ read [a-zA-Z]+/' your_file
答案 1 :(得分:1)
我不完全确定如何匹配CJK字符,但如果您匹配非ASCII字符,可能会达到您正在寻找的结果:
grep -vP "[^A-Za-z]read [\x80-\xFF]" file.txt
理论上,你应该可以这样做:
grep -vP "[^A-Za-z]read [\x{2E80}-\x{9FBB}]+" file.txt
然而,在我的测试中,我收到错误:
grep: character value in \x{...} sequence is too large
http://en.wikipedia.org/wiki/List_of_Unicode_characters#CJK_unified_ideographs
修改强>
LC_ALL="POSIX" sed -r '/[^A-Za-z]read [\o200-\o377]+/d' file.txt
结果:
Do you like to read books? (not deleted)
This is some thread 线. (not deleted)
另见:
How to delete all CJK text appearing immediately after a particular symbol?