我在CWD,a.txt和b.txt中有两个文本文件。 从a.txt ,我想删除b.txt 中不存在前5个字符的所有行,作为任何行的前5个字符。 (或者说,否则,只保留a.txt中的那些行,其前5个字符在b.txt中作为任何行的前5个字符出现。)第5个字符到行尾的内容是无关紧要的。
例如: 的 A.TXT
abcde000
0123456xxx
xyzxyzxyz
kkkkkkkkkkk
b.txt
012345aabbcc
kkkkkkkhhkkvv
nnnnnnnnnnn
结果(a.txt中的行,b.txt中有1-5个字符):
0123456xxx
kkkkkkkkkkk
正在制作的剧本(我很困惑如何搜索前5个字符的精确匹配):
with open('a.txt', 'r') as file1:
with open('b.txt', 'r') as file2:
same = set(file1).intersection(file2)
file1[0][4]
file1[0][4]
same.discard('\n')
with open('same_start.txt', 'w') as file_out:
for line in same:
file_out.write(line)
有什么建议吗?
答案 0 :(得分:1)
你可以试试这个:
with open('a.txt') as f1, open('b.txt') as f2:
lines1 = f1.readlines()
lines2 = f2.readlines()
result = []
for line1 in lines1:
for line2 in lines2:
if len(line1.strip()) >= 5 and line1[:5] == line2[:5]:
result.append(line1)
with open('a.txt', 'w') as f1:
f1.writelines(result)
请注意,Python的切片是非常隐蔽的,因为来自长度小于101的字符串的s[:100]
是相同的字符串。因此,您应该检查 - 每行是否包含足够数量的字符。在上面的方法中,这是通过条件len(line1.strip()) >= 5
实现的,它保证了所提供的方法将消除长度小于5的行以及长行空格。
例如:
a.txt
---------------
abcde000
0123456xxx
xyzxyzxyz
kkkkkkkkkkk
1
# <== 10 spaces here
2
3
b.txt
---------------
012345aabbcc
kkkkkkkhhkkvv
nnnnnnnnnnn
# <== 12 spaces here
1
2
3
result (a.txt)
---------------
0123456xxx
kkkkkkkkkkk
答案 1 :(得分:1)
在两线解决方案中:
b_file=[line.strip() for line in open('b.txt','r')]
a_file=[j for line in open('a.txt','r') for j in b_file if line[:5] in j ]
然后你可以将a_file的输出写入a.txt:
with open('a.txt','w') as f:
for item in a_file:
f.write(item + '\n')
详细解决方案:
compare=[]
with open('b.txt') as f:
for line in f:
compare.append(line.strip())
new=[]
with open('a.txt') as f1:
for line in f1:
for j in compare:
if line[:5] in j:
new.append(j)
with open('a.txt','w') as f3:
for j in new:
f3.write(j+'\n')
答案 2 :(得分:0)
您可以将file2中每行的前5个字符存储到列表中,然后检查该列表中是否存在任何文件行的前5个字符!
>>> with open('c.txt','w') as out:
... with open('a.txt','r') as inp1, open('b.txt','r') as inp2:
... infile2_content=[line[:5] for line in inp2.readlines()]
... [out.write(line) for line in inp1 if line[:5] in infile2_content]
...
[None, None]
>>>
>>> with open('c.txt','r') as f:
... print f.read().splitlines()
...
['0123456xxx', 'kkkkkkkkkkk']
答案 3 :(得分:0)
with open('a.txt', 'r') as file1:
a_lines = file1.readlines()
with open('b.txt', 'r') as file2:
short_b = [b[:5] for b in file2.readlines()]
keepers = [a.strip() for a in a_lines if a[:5] in short_b]
keepers
['0123456xxx', 'kkkkkkkkkkk']
答案 4 :(得分:0)
可以尝试使用此代码
list2=['pankajsdhasjfdvajshf','mithsfggasjfasjhf']
list1=['pankajddsdsdf','mithilieshdsfsfsfsdf']
for i in range (0,len(list1)):
for j in range (0,len(list2)):
if list1[i][0:6] in list2[j][0:6]:
print 'true'
print list1[i]
else:
print 'fail'