鉴于infile包含:
aaaaaaa"pic01.jpg"bbbwrtwbbbsize 110KB
aawerwefrewqa"pic02.jpg"bbbertebbbsize 100KB
atyrtyruraa"pic03.jpg"bbbwtrwtbbbsize 190KB
如何获取outfile:
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB
我的代码是:
with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
for line in infile:
lines_set1 = line.split ('"')
lines_set2 = line.split (' ')
for item_set1 in lines_set1:
for item_set2 in lines_set2:
if item_set1.endswith ('.jpg'):
if item_set2.endswith ('KB'):
outfile.write (item_set1 + ' ' + item_set2 + '\n')
但代码会生成空白文件。怎么了???
答案 0 :(得分:3)
您的代码只有一个主要问题:if item_set2.endswith ('KB')
检查不起作用,因为每行末尾都有一个换行符。将其替换为(注意strip()
电话):
if item_set2.strip().endswith('KB'):
此外,您不需要+ '\n'
,因为item_set2
最后已经包含一个新行:
outfile.write (item_set1 + ' ' + item_set2.strip())
仅供参考,您可以将正则表达式与保存组一起使用以提取数据:
import re
with open('test.txt', 'r') as infile, open('outfile.txt', 'w') as outfile:
for line in infile:
match = re.search(r'"(.*)"\w+\s(\w+)', line)
outfile.write(' '.join(match.groups()) + "\n")
运行代码后outfile.txt
的内容:
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB
答案 1 :(得分:3)
无需导入re
的解决方案。条件可以改善为单线条件。
with open('test.txt', 'r') as infile, open('outfile.txt', 'w') as outfile:
for line in infile:
filename = line.strip().split('"')[1]
size = line.rsplit(None, 1)[-1]
if filename.endswith('.jpg') and size.endswith('KB'):
outfile.write('%s %s\n' % (filename, size))
答案 2 :(得分:2)
您应该使用正则表达式,这将简化您的代码。有类似的东西:
import re
with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
for line in infile:
obj = re.match('.+"(.+\.jpg)".+\s(\d+KB)', line)
if obj:
outfile.write (obj.group(1) + ' ' + obj.group(2) + '\n')
此脚本返回的outfile.txt:
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB
答案 3 :(得分:2)
首先,在空格处分割线并取第二个项目(在0基础列表中,第一个项目),这将给出大小部分。
接下来,将第一项拆分为“并取第二项。这将给出文件名。
检查在线演示,如果你想知道它是如何拆分的。
with open ('test.txt', 'r') as infile, open ('outfile.txt', 'w') as outfile:
for line in infile:
Parts = line.split()
outfile.write (Parts[0].split('"')[1] + " " + Parts[1] + "\n")
<强>输出:强>
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB
在线演示:
答案 4 :(得分:0)
使用sed:
$ sed 's/.*"\(.*\)".*size \(.*\)/\1 \2/' foo.txt
pic01.jpg 110KB
pic02.jpg 100KB
pic03.jpg 190KB