我正在尝试从txt文件获取数据并将其放入数据框。文本文件如下所示:
******************************************************************************************************************************
DATE BUY:2018/05/26
****************************************************************************************************************************
STORE: DUBIDUBI SAILOR: 123456
***********************************************************************************************************************
< CLIENT >
NAME CLIENT MEMBER TYPE MEMBER NUMBER: 89101112
ANTONY STARK 1
<PRODUCTS>
NUM-PRODUCTS
6
< ADDRESS TO SEND>
186 FLEET STREET
-----------------------------------------------------------------------------------------------------------------------
< CLIENT >
NAME CLIENT MEMBER TYPE MEMBER NUMBER: 13141516
THOR 2
<PRODUCTS>
NUM-PRODUCTS
2
< ADDRESS TO SEND>
1800 PENNSYLVANIA STREET
<SERVICES>
NUM-SERVICE TYPE OF SERVICE
64 DEVOLUTION
*****************************************************************************************************************************
我想获得一个数据框,其中包含由同一名水手在同一家商店协助下的所有客户信息的列表。
有效的方法:下面的代码逐行读取文本,并提取每行的信息。
data []
global STORE, DATE_BUY, SAILOR, CLIENT, MEMBER_NUM, NUM_PRODUCTS, ADDRESS, NUM_SERVICE, TYPE_MEMB, TYPE_SERV
STORE = ""
DATE_BUY = ""
SAILOR = ""
CLIENT = ""
MEMBER_NUM = ""
NUM_PRODUCTS = ""
ADDRESS = ""
NUM_SERVICE = ""
TYPE_MEMB = ""
TYPE_SERV = ""
with open ('myfile', 'r') as txt_file:
read_file = txt_file.readlines()
for i in range(0, len(read_file)):
line = read_file[i]
z = line[0:50]
a = line[0:9]
b = line[0:42]
c = line[112:132]
d = line[0:14]
e = line[0:14]
dif_client = line[0:58]
if a == " STORE":
STORE = line[10:28]
SAILOR = line[30:45]
elif c == " DATE BUY":
DATE_BUY = line[133:145]
elif b == " NAME CLIENT"
nextline = read_file[i + 1]
CLIENT = nextline[0:57]
MEMBER_NUM = nextline[96:126]
TYPE_MEMB = nextline[79:86]
elif d == " < ADDRESS":
nextline = read_file[i + 1]
ADDRESS = nextline[0:63]
elif e == " < PRODUCTS":
nextline = read_file[i + 1]
NUM_PRODUCTS = nextline[0:24]
elif f == " <SERVICES":
nextline = read_file[i + 1]
NUM_SERVICE = nextline[]
TYPE_SERV = nextline[]
data.append({'Store':STORE, 'Sailor':SAILOR, 'Date_Buy':DATE_BUY, 'Client':CLIENT, 'Member_Num':MEMBER_NUM,
'Type_Memb':TYPE_MEMB, 'Address':ADDRESS, 'Products':NUM_PRODUCTS,'Num_Serv':NUM_SERVICE, 'Type_Serv':TYPE_SERV})
df = pd.DataFrame(data)
什么不起作用:当使用嵌套的while循环来提取由水手协助的每个客户端的信息时,我的代码根本不会结束运行。无效的代码是:
data []
global STORE, DATE_BUY, SAILOR, CLIENT, MEMBER_NUM, NUM_PRODUCTS, ADDRESS, NUM_SERVICE, TYPE_MEMB, TYPE_SERV
STORE = ""
DATE_BUY = ""
SAILOR = ""
CLIENT = ""
MEMBER_NUM = ""
NUM_PRODUCTS = ""
ADDRESS = ""
NUM_SERVICE = ""
TYPE_MEMB = ""
TYPE_SERV = ""
with open ('myfile', 'r') as txt_file:
read_file = txt_file.readlines()
for i in range(0, len(read_file)):
line = read_file[i]
z = line[0:50]
a = line[0:9]
b = line[0:42]
c = line[112:132]
d = line[0:14]
e = line[0:14]
dif_client = line[0:58]
while dif_client != " < CLIENT >":
if a == " STORE":
STORE = line[10:28]
SAILOR = line[30:45]
elif c == " DATE BUY":
DATE_BUY = line[133:145]
elif b == " NAME CLIENT"
nextline = read_file[i + 1]
CLIENT = nextline[0:57]
MEMBER_NUM = nextline[96:126]
TYPE_MEMB = nextline[79:86]
elif d == " < ADDRESS":
nextline = read_file[i + 1]
ADDRESS = nextline[0:63]
elif e == " < PRODUCTS":
nextline = read_file[i + 1]
NUM_PRODUCTS = nextline[0:24]
elif f == " <SERVICES":
nextline = read_file[i + 1]
NUM_SERVICE = nextline[]
TYPE_SERV = nextline[]
data.append({'Store':STORE, 'Sailor':SAILOR, 'Date_Buy':DATE_BUY, 'Client':CLIENT, 'Member_Num':MEMBER_NUM,
'Type_Memb':TYPE_MEMB, 'Address':ADDRESS, 'Products':NUM_PRODUCTS,'Num_Serv':NUM_SERVICE, 'Type_Serv':TYPE_SERV})
df = pd.DataFrame(data)
所需的输出应如下所示。我知道每个客户信息都是在文字中显示<客户>时出现的。
Date_buy Store Sailor Client Member_Number Num_Products Address_to_send num_Service type_serv
2018/05/26 dubidubi 123456 ANTONY STARK 89101112 6 186 FLEET STREET
2018/05/26 dubidubi 123456 THOR 13141516 2 1800 PENNSYLVANIA STREET 64 DEVOLUTION
答案 0 :(得分:0)
感谢您的描述。问题出在您构建的无限循环中:
dif_client = line[0:58]
while dif_client != " < CLIENT >":
if a == " STORE":
...
dif_client
在循环内不变。没有break
或其他出口,只有while
条件(这是好的设计)。因此,一旦进入循环,就无法离开:dif_client
是常量。
您的逻辑不正确:您有两个循环试图遍历文件的各行:
for i in range(0, len(read_file)):
...
while dif_client != " < CLIENT >":
# Process one line
while
的正文旨在处理一行。完成后,您需要转到for
的下一个迭代以获取下一行。找到CLIENT
行是if
的决定,而不是循环。