嵌套while循环以提取文本数据

时间:2019-07-10 18:30:17

标签: python

我正在尝试从txt文件获取数据并将其放入数据框。文本文件如下所示:

******************************************************************************************************************************
                                                                                                    DATE BUY:2018/05/26
****************************************************************************************************************************
STORE: DUBIDUBI                                                     SAILOR: 123456
***********************************************************************************************************************
            < CLIENT >                                                                      
NAME CLIENT                                 MEMBER TYPE                                     MEMBER NUMBER: 89101112     
ANTONY STARK                                    1
            <PRODUCTS>
NUM-PRODUCTS
    6
            < ADDRESS TO SEND>
186 FLEET STREET
-----------------------------------------------------------------------------------------------------------------------
            < CLIENT >                                                                      
NAME CLIENT                                 MEMBER TYPE                                     MEMBER NUMBER: 13141516     
THOR                                            2
            <PRODUCTS>
NUM-PRODUCTS
    2
            < ADDRESS TO SEND>
1800 PENNSYLVANIA STREET
            <SERVICES>                  
NUM-SERVICE                             TYPE OF SERVICE
    64                                   DEVOLUTION
*****************************************************************************************************************************

我想获得一个数据框,其中包含由同一名水手在同一家商店协助下的所有客户信息的列表。

有效的方法:下面的代码逐行读取文本,并提取每行的信息。

data []
    global STORE, DATE_BUY, SAILOR, CLIENT, MEMBER_NUM, NUM_PRODUCTS, ADDRESS, NUM_SERVICE, TYPE_MEMB, TYPE_SERV
    STORE = ""
    DATE_BUY = ""
    SAILOR = ""
    CLIENT = ""
    MEMBER_NUM = ""
    NUM_PRODUCTS = ""
    ADDRESS = ""
    NUM_SERVICE = ""
    TYPE_MEMB = ""
    TYPE_SERV = ""

    with open ('myfile', 'r') as txt_file:
        read_file = txt_file.readlines()
        for i in range(0, len(read_file)):
            line = read_file[i]
            z = line[0:50]
            a = line[0:9]
            b = line[0:42]
            c = line[112:132]
            d = line[0:14]
            e = line[0:14]
            dif_client = line[0:58]

                if a == " STORE":
                    STORE =  line[10:28]
                    SAILOR = line[30:45]
                elif c == " DATE BUY":
                    DATE_BUY = line[133:145]
                elif b == " NAME CLIENT"
                    nextline = read_file[i + 1]
                    CLIENT = nextline[0:57]
                    MEMBER_NUM = nextline[96:126]
                    TYPE_MEMB = nextline[79:86]
                elif d == " < ADDRESS":
                    nextline = read_file[i + 1]
                    ADDRESS = nextline[0:63]
                elif e == " < PRODUCTS":
                    nextline = read_file[i + 1]
                    NUM_PRODUCTS = nextline[0:24]
                elif f == " <SERVICES":
                    nextline = read_file[i + 1]
                    NUM_SERVICE = nextline[]
                    TYPE_SERV = nextline[]
            data.append({'Store':STORE, 'Sailor':SAILOR, 'Date_Buy':DATE_BUY, 'Client':CLIENT, 'Member_Num':MEMBER_NUM,
                        'Type_Memb':TYPE_MEMB, 'Address':ADDRESS, 'Products':NUM_PRODUCTS,'Num_Serv':NUM_SERVICE, 'Type_Serv':TYPE_SERV})
        df = pd.DataFrame(data)

什么不起作用:当使用嵌套的while循环来提取由水手协助的每个客户端的信息时,我的代码根本不会结束运行。无效的代码是:

data []
global STORE, DATE_BUY, SAILOR, CLIENT, MEMBER_NUM, NUM_PRODUCTS, ADDRESS, NUM_SERVICE, TYPE_MEMB, TYPE_SERV
STORE = ""
DATE_BUY = ""
SAILOR = ""
CLIENT = ""
MEMBER_NUM = ""
NUM_PRODUCTS = ""
ADDRESS = ""
NUM_SERVICE = ""
TYPE_MEMB = ""
TYPE_SERV = ""

with open ('myfile', 'r') as txt_file:
    read_file = txt_file.readlines()
    for i in range(0, len(read_file)):
        line = read_file[i]
        z = line[0:50]
        a = line[0:9]
        b = line[0:42]
        c = line[112:132]
        d = line[0:14]
        e = line[0:14]
        dif_client = line[0:58]
        while dif_client != " < CLIENT >":
            if a == " STORE":
                STORE =  line[10:28]
                SAILOR = line[30:45]
            elif c == " DATE BUY":
                DATE_BUY = line[133:145]
            elif b == " NAME CLIENT"
                nextline = read_file[i + 1]
                CLIENT = nextline[0:57]
                MEMBER_NUM = nextline[96:126]
                TYPE_MEMB = nextline[79:86]
            elif d == " < ADDRESS":
                nextline = read_file[i + 1]
                ADDRESS = nextline[0:63]
            elif e == " < PRODUCTS":
                nextline = read_file[i + 1]
                NUM_PRODUCTS = nextline[0:24]
            elif f == " <SERVICES":
                nextline = read_file[i + 1]
                NUM_SERVICE = nextline[]
                TYPE_SERV = nextline[]
        data.append({'Store':STORE, 'Sailor':SAILOR, 'Date_Buy':DATE_BUY, 'Client':CLIENT, 'Member_Num':MEMBER_NUM,
                    'Type_Memb':TYPE_MEMB, 'Address':ADDRESS, 'Products':NUM_PRODUCTS,'Num_Serv':NUM_SERVICE, 'Type_Serv':TYPE_SERV})
    df = pd.DataFrame(data)

所需的输出应如下所示。我知道每个客户信息都是在文字中显示<客户>时出现的。

Date_buy        Store       Sailor      Client          Member_Number   Num_Products    Address_to_send     num_Service   type_serv
2018/05/26      dubidubi    123456      ANTONY STARK    89101112           6            186 FLEET STREET    
2018/05/26      dubidubi    123456      THOR            13141516           2        1800 PENNSYLVANIA STREET    64      DEVOLUTION

1 个答案:

答案 0 :(得分:0)

感谢您的描述。问题出在您构建的无限循环中:

    dif_client = line[0:58]
    while dif_client != " < CLIENT >":
        if a == " STORE":
            ...

dif_client在循环内不变。没有break或其他出口,只有while条件(这是好的设计)。因此,一旦进入循环,就无法离开:dif_client是常量。

您的逻辑不正确:您有两个循环试图遍历文件的各行:

    for i in range(0, len(read_file)):
        ...
        while dif_client != " < CLIENT >":
            # Process one line

while的正文旨在处理一行。完成后,您需要转到for的下一个迭代以获取下一行。找到CLIENT行是if的决定,而不是循环。