我想列出字符串跟随字符串。
STUDENT
a john
a anny
SUBJECT
b math
b physical
CLASS
a one
a two
a three
STUDENT
a pone
b julia
b sopia
CLASS
a four
a five
PROFESSOR
b uno
b sonovon
PROFESSOR
b jone
我的目标是删除重复的SUBJECT并加入内容。
SUBJECT可以是随机的上部字符串。
但内容必须是a
或b
我该怎么做?
答案 0 :(得分:1)
因为关于SUBJECTS的唯一信息是它们是高位字符串,所以您可以使用isupper()
谓词以这种方式拆分文件:
def split_string(file_name):
list_ = [ x for x in open(file_).read().splitlines()]
for i,j in enumerate(list_):
if not (j.isupper() and list_[i + 1].isupper()):
print j
split(file_name)
注意:我想这里你的字符串存储在一个文件
中答案 1 :(得分:1)
只需使用主题作为关键字对dict中的元素进行分组:
from collections import OrderedDict
od = OrderedDict()
with open("match.txt") as f:
key = next(f)
for line in f:
if line.startswith(("a","b")):
od.setdefault(key,[]).append(line)
else:
key = line
输出:
for sub,cont in od.items():
print("{}, {}".format(sub, cont))
STUDENT
, ['a john\n', 'a anny\n', 'a pone\n', 'b julia\n', 'b sopia\n']
SUBJECT
, ['b math\n', 'b physical\n']
CLASS
, ['a one\n', 'a two\n', 'a three\n', 'a four\n', 'a five\n']
PROFESSOR
, ['b uno\n', 'b sonovon\n', 'b jone']
正确分组数据,这就是我的目标是删除重复的SUBJECT并加入内容。非常明显,这就是你想要的。
OrderedDict将保持顺序,如果你想将更新的行写入文件只是重新打开并在迭代时编写.items?
with open("match.txt", "w") as f:
for sub, cont in od.items():
f.write(sub)
f.writelines(cont)
新输出,由主题加入:
STUDENT
a john
a anny
a pone
b julia
b sopia
SUBJECT
b math
b physical
CLASS
a one
a two
a three
a four
a five
PROFESSOR
b uno
b sonovon
b jone