Question

我有一个看起来像这样的文件。

(df_answers.astype(str)+df_master.astype(str)).replace({'00':0,'01':3,'10':2,'11':1})
Out[129]: 
   B0  B1  B2  B3  B4
0   3   2   0   0   0
1   0   0   3   2   0
2   0   0   0   1   0
3   0   0   0   0   1
4   0   1   0   0   0
5   1   0   0   0   0

符号是占位符。

我对由第一个和第二个符号组成且由a/b/X/Y/1 a/b/X/Y/2 a/b/X/Y/3 ... a/b/X/Z/1 a/b/X/Z/2 a/b/X/Z/3 ... a/c/M/N/1 a/c/M/N/2 a/c/M/N/3 ... a/d/F/G/123 a/d/F/G/124 a/d/F/G/125分隔的唯一子字符串感兴趣。换句话说，在示例中，我想构建一个由"/"组成的列表。

如何用Python惯用的方式做到这一点？

Answer 1

我将根据/拆分行，最大拆分数为2，丢弃最后一项（这是您不需要的其余字符串），重新加入字符串，然后将其放入设置理解以删除重复项：

with open("file.txt") as lines:
    result = {"/".join(s.split("/",maxsplit=2)[:-1]) for s in lines}

结果：

>>> result
{'a/b', 'a/c', 'a/d'}

Answer 2

我做了一个没有太多魔术的例子。首先，我们写入文件，但是您可以删除该文件（只是为了轻松向您展示其工作原理）。

file_content = """a/b/X/Y/1
a/b/X/Y/2
a/b/X/Y/3
a/b/X/Z/1
a/b/X/Z/2
a/b/X/Z/3
a/c/M/N/1
a/c/M/N/2
a/c/M/N/3
a/d/F/G/123
a/d/F/G/124
a/d/F/G/125
"""

# This can be removed as it is just to show how it works
with open('file.txt', 'w') as f:
    f.write(file_content)

with open('file.txt', 'r') as f:
    lines = f.readlines()

result = set()
for line in lines:
    a, b, *rest = line.split('/')
    result.add(f'{a}/{b}')

print(result)

Answer 3

def find_unique(input_file):
    output = set()
    with open(input_file) as f:
        for line in f:
            output.add(line.strip()[0:3])

    return list(output)

print(find_unique("input_file"))

这给出了：

['a/b', 'a/d', 'a/c']

对于包含以下内容的文件：

a/b/X/Y/1
a/b/X/Y/2
a/b/X/Y/3
a/b/X/Z/1
a/b/X/Z/2
a/b/X/Z/3
a/c/M/N/1
a/c/M/N/2
a/c/M/N/3
a/d/F/G/123
a/d/F/G/124
a/d/F/G/125

存储多行中唯一的子字符串

3 个答案: