根据前x个字符匹配(两个文件之间)比较和删除行

时间:2017-11-01 05:44:39

标签: python windows list text compare

我在CWD,a.txt和b.txt中有两个文本文件。 从a.txt ,我想删除b.txt 中不存在前5个字符的所有行,作为任何行的前5个字符。 (或者说,否则,只保留a.txt中的那些行,其前5个字符在b.txt中作为任何行的前5个字符出现。)第5个字符到行尾的内容是无关紧要的。

例如: 的 A.TXT

abcde000
0123456xxx
xyzxyzxyz
kkkkkkkkkkk

b.txt

012345aabbcc
kkkkkkkhhkkvv
nnnnnnnnnnn

结果(a.txt中的行,b.txt中有1-5个字符):

0123456xxx
kkkkkkkkkkk

正在制作的剧本(我很困惑如何搜索前5个字符的精确匹配):

with open('a.txt', 'r') as file1:
    with open('b.txt', 'r') as file2:
        same = set(file1).intersection(file2)

        file1[0][4]
        file1[0][4]

same.discard('\n')

with open('same_start.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

有什么建议吗?

5 个答案:

答案 0 :(得分:1)

你可以试试这个:

with open('a.txt') as f1, open('b.txt') as f2:

    lines1 = f1.readlines()
    lines2 = f2.readlines()

    result = []

    for line1 in lines1:
        for line2 in lines2:
            if len(line1.strip()) >= 5 and line1[:5] == line2[:5]:  
                result.append(line1)


with open('a.txt', 'w') as f1:
    f1.writelines(result)

请注意,Python的切片是非常隐蔽的,因为来自长度小于101的字符串的s[:100]是相同的字符串。因此,您应该检查 - 每行是否包含足够数量的字符。在上面的方法中,这是通过条件len(line1.strip()) >= 5实现的,它保证了所提供的方法将消除长度小于5的行以及长行空格。

例如:

a.txt
---------------
abcde000
0123456xxx
xyzxyzxyz
kkkkkkkkkkk

1
          # <== 10 spaces here
2
3
b.txt
---------------
012345aabbcc
kkkkkkkhhkkvv
nnnnnnnnnnn
            # <== 12 spaces here

1
2
3
result (a.txt)
---------------
0123456xxx
kkkkkkkkkkk

答案 1 :(得分:1)

在两线解决方案中:

b_file=[line.strip() for line in open('b.txt','r')]
a_file=[j for line in open('a.txt','r') for j in b_file if line[:5] in j ]

然后你可以将a_file的输出写入a.txt:

with open('a.txt','w') as f:
    for item in a_file:
        f.write(item + '\n')
  

详细解决方案:

compare=[]
with open('b.txt') as f:
    for line in f:
        compare.append(line.strip())
new=[]
with open('a.txt') as f1:
    for line in f1:
        for j in compare:
            if line[:5] in j:
                new.append(j)


with open('a.txt','w') as f3:
    for j in new:
        f3.write(j+'\n')

答案 2 :(得分:0)

您可以将file2中每行的前5个字符存储到列表中,然后检查该列表中是否存在任何文件行的前5个字符!

>>> with open('c.txt','w') as out:
...     with open('a.txt','r') as inp1, open('b.txt','r') as inp2:
...             infile2_content=[line[:5] for line in inp2.readlines()]
...             [out.write(line) for line in inp1 if line[:5] in infile2_content]
...
[None, None]
>>>
>>> with open('c.txt','r') as f:
...     print f.read().splitlines()
...
['0123456xxx', 'kkkkkkkkkkk']

答案 3 :(得分:0)

with open('a.txt', 'r') as file1:
    a_lines = file1.readlines()
    with open('b.txt', 'r') as file2:
        short_b = [b[:5] for b in file2.readlines()]
        keepers = [a.strip() for a in a_lines if a[:5] in short_b]

keepers
['0123456xxx', 'kkkkkkkkkkk']

答案 4 :(得分:0)

可以尝试使用此代码

list2=['pankajsdhasjfdvajshf','mithsfggasjfasjhf']
list1=['pankajddsdsdf','mithilieshdsfsfsfsdf']

for i in range (0,len(list1)):
    for j in range (0,len(list2)):
        if list1[i][0:6] in list2[j][0:6]:
            print 'true'
            print list1[i]
        else:
            print 'fail'