我是Python的新手。我有一个文本文件,其中包含以下格式的许多数据块以及其他不必要的块。
NOT REQUIRED :: 123
Connected Part-1:: A ~$
Connected Part-3:: B ~$
Connector Location:: 100 200 300 ~$
NOT REQUIRED :: 456
Connected Part-2:: C ~$
我希望提取与每个属性(连接的第1部分,连接器位置)对应的信息(A,B,C,100 200 300)并将其存储为列表以便稍后使用。我准备了以下代码来读取文件,清理行并将其存储为列表。
import fileinput
with open('C:/Users/file.txt') as f:
content = f.readlines()
for line in content:
if 'Connected Part-1' in line or 'Connected Part-3' in line:
if 'Connected Part-1' in line:
connected_part_1 = [s.strip(' \n ~ $ Connected Part -1 ::') for s in content]
print ('PART_1:',connected_part_1)
if 'Connected Part-3' in line:
connected_part_3 = [s.strip(' \n ~ $ Connected Part -3 ::') for s in content]
print ('PART_3:',connected_part_3)
if 'Connector Location' in line:
# removing unwanted characters and converting into the list
content_clean_1 = [s.strip('\n ~ $ Connector Location::') for s in content]
#converting a single string item in list to a string
s = " ".join(content_clean_1)
# splitting the string and converting into a list
weld_location= s.split(" ")
print ('POSITION',weld_location)
这是输出
PART_1: ['A', '\t\tConnector Location:: 100.00 200.00 300.00', '\t\tConnected Part-3:: C~\t']
POSITION ['d', 'Part-1::', 'A', '\t\tConnector', 'Location::', '100.00', '200.00', '300.00', '\t\tConnected', 'Part-3::', 'C~\t']
PART_3: ['1:: A', '\t\tConnector Location:: 100.00 200.00 300.00', '\t\tConnected Part-3:: C~\t']
从这个程序的输出,我可以得出结论,因为'内容'是包含文件中所有字符的字符串,程序不读取单独的行。相反,它将所有文本视为单个字符串。在这种情况下,有人可以帮忙吗?
我期待以下输出:
PART_1: ['A']
PART_3: ['C']
POSITION: ['100.00', '200.00','300.00']
(注意)当我使用包含单行数据的单个文件时,它可以正常工作。抱歉这么长的问题
答案 0 :(得分:0)
我会尽力说清楚,并说明如何在没有regex
的情况下这样做。首先,所提供代码的最大问题是,当使用string.strip
函数时,正在读取整个内容列表:
connected_part_1 = [s.strip(' \n ~ $ Connected Part -1 ::') for s in content]
内容是整个文件行,我想你想要的只是:
connected_part_1 = [line.strip(' \n ~ $ Connected Part -1 ::')]
如何解析文件有点主观,但考虑到作为输入发布的文件格式,我会这样做:
templatestr = "{}: {}"
with open('inputreadlines.txt') as f:
content = f.readlines()
for line in content:
label, value = line.split('::')
ltokens = label.split()
if ltokens[0] == 'Connected':
print(templatestr.format(
ltokens[-1], #The last word on the label
value.split()[:-1])) #the split value without the last word '~$'
elif ltokens[0] == 'Connector':
print(value.split()[:-1]) #the split value without the last word '~$'
else: #NOT REQUIRED
pass
您可以使用string.strip
功能删除有趣的字符'〜$'而不是像示例中那样删除最后一个令牌。