我试图从文本文件中提取字段值,格式如下:
{fieldvalue1} {fieldvalue2} {fieldvalue3}
但是,字段值本身可以包含本身用卷曲括号分隔的子字段,例如:
{abc} {xyz} {efg {123} {pqx}}
所以在上面的例子中,所需的输出是:
* fieldvalue1 = abc
* fieldvalue2 = xyz
* fieldvalue3 = efg {123} {pqx}
我尝试了以下过滤器:
sed 's/^{//g;s/}$//g' | awk -F"} {"
然而,这显然无法正确解析上面的 fieldvalue3 。
答案 0 :(得分:0)
你可以通过计算字符来强制它:
import regex
def findall_over_file_with_caveats(pattern, file):
# Caveats:
# - doesn't support ^ or backreferences, and might not play well with
# advanced features I'm not aware of that regex provides and re doesn't.
# - Doesn't do the careful handling that zero-width matches would need,
# so consider behavior undefined in case of zero-width matches.
# - I have not bothered to implement findall's behavior of returning groups
# when the pattern has groups.
# Unlike findall, produces an iterator instead of a list.
# bytes window for bytes pattern, unicode window for unicode pattern
# We assume the file provides data of the same type.
window = pattern[:0]
chunksize = 8192
sentinel = object()
last_chunk = False
while not last_chunk:
chunk = file.read(chunksize)
if not chunk:
last_chunk = True
window += chunk
match = sentinel
for match in regex.finditer(pattern, window, partial=not last_chunk):
if not match.partial:
yield match.group()
if match is sentinel or not match.partial:
# No partial match at the end (maybe even no matches at all).
# Discard the window. We don't need that data.
# The only cases I can find where we do this are if the pattern
# uses unsupported features or if we're on the last chunk, but
# there might be some important case I haven't thought of.
window = window[:0]
else:
# Partial match at the end.
# Discard all data not involved in the match.
window = window[match.start():]
if match.start() == 0:
# Our chunks are too small. Make them bigger.
chunksize *= 2
答案 1 :(得分:0)
输入看起来像列表的tcl列表:) Tcl处理得很好。
逐行显示示例读取文件in.txt,并在所需输出中显示字段。
#!/bin/sh
# the next line restarts using expect \
exec tclsh "$0" "$@"
# open file in.txt
set fd [open in.txt]
# loop till end of file
while {![eof $fd]} {
# read line
set line [gets $fd]
set i 0
# iterate over all elements
foreach elm $line {
incr i
puts "* fieldvalue$i = $elm"
}
}
close $fd
或者单行示例处理一行数据。 使用了expect,因为它允许在命令行中定义tcl命令
echo '{abc} {xyz} {efg {123} {pqx}}' | expect -c 'puts [join [lmap _ [gets stdin] {incr i; set _ "* fieldvalue$i = $_"}] \n]'
答案 2 :(得分:0)
另一个快速的问题:
#!/usr/bin/awk -f
{
for(i=1;i<=NF;i++)
{
$i = e (e?FS:"") $i
l = split($i,a,"{")
r = split($i,a,"}")
if(l == r)
{
print "* fieldvalue" ++c,$i
e=""
}
else
e = $i
}
}