Question

我正试图在标题为“ 2010年7月，美国，收到的价格，有比较”的第二个表（大田作物）中捕获数据here。我正在使用Panda数据帧从文本文件捕获表，然后将其输出到CSV文件。

我的代码如下

def find_no_line_start_table(table_title,splited_data):
    found_no_lines = []
    for index, line in enumerate(splited_data):
        if table_title in line:
            found_no_lines.append(index)

    return found_no_lines

def get_start_data_table(table_start, splited_data):
    for index, row in enumerate(splited_data[table_start:]):
        if 'Dollars' in row:
            return table_start + index

def get_end_table(start_table_data, splited_data ):
    for index, row in enumerate(splited_data[start_table_data:]):
            if END_TABLE_LINE in row:
                return start_table_data + index

def row(l):
    l = l.split()
    number_columns = 6
    if len(l) >= number_columns: 
        data_row = [''] * number_columns
        first_column_done = False

        index = 0
        for w in l:
            if not first_column_done:
                data_row[0] = ' '.join([data_row[0], w])
                if ':' in w:
                        first_column_done = True
            else:
                index += 1
                data_row[index] = w

        return data_row

def take_table(txt_data):
    comodity = []
    q = []
    w = []
    e = []
    t = []
    p = []

    for r in table:
        data_row = row(r)
        if data_row:
            col_1, col_2, col_3, col_4, col_5, col_6 = data_row
            comodity.append(col_1)
            q.append(col_2)
            w.append(col_3)
            e.append(col_4)
            t.append(col_5)
            p.append(col_6)

    table_data = {'comodity': comodity, 'q': q,
                  'w': w, 'e': e, 't': t}
    return table_data

然后，我正在这样做：

import requests
import pandas as pd
txt_data = requests.get("https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/6w924d00c/9z903130m/AgriPric-07-30-2010.txt").text
splited_data = txt_data.split('\n')
table_title = 'Prices Received, United States'
END_TABLE_LINE = '-------------------------------------------'
_, table_start,_ = find_no_line_start_table(table_title,splited_data)
start_line = get_start_data_table(table_start, splited_data)
end_line = get_end_table(start_line, splited_data)
table = splited_data[start_line : end_line]
dict_table = take_table(txt_data)
pd.DataFrame(dict_table)
c = pd.DataFrame(dict_table)

IndexError: list assignment index out of range

但是，我在这里遇到错误。有人可以帮我弄清楚我在做什么错吗？

Answer 1

错误原因：

data_row是6元素的列表。

number_columns = 6
# ...
    data_row = [''] * number_columns  # [''] * 6

和index将随着first_column_done = True的每次迭代而增加。但是，first_column_done在一个单词中遇到True时即为:，即

if ':' in w:
    first_column_done = True

因此，对于first_column_done变为True之后的每次迭代，index都会递增，直到超过列表列表6的{{1}}。

data_row

换句话说，对于在该行中的单词中首次出现def row(l): l = l.split() number_columns = 6 if len(l) >= number_columns: data_row = [''] * number_columns first_column_done = False index = 0 for w in l: if not first_column_done: data_row[0] = ' '.join([data_row[0], w]) if ':' in w: first_column_done = True else: index += 1 data_row[index] = w # error pos.之后，每行包含大于6 - index的单词的行，U都会出现此错误。

修复：

使用:和split(':')以及python list comprehension。

tertiary operator

使用Python从文本文件中提取内容时出现问题

1 个答案:

错误原因：

修复：