Question

我有一个简单的问题，即读取一个Excel工作表，将包含约83列的每一行视为唯一的数据库记录，将其添加到本地数据记录中，最后附加并写入DBF文件中。

我可以从excel中提取所有值并将其添加到列表中。但是该列表的语法不正确，我不知道如何准备/将列表转换为数据库记录。我正在使用Openpyxl，dbf和python 3.7。

此刻，我仅测试并尝试准备第3行的数据（因此min_max行= 3）

我知道数据应该采用以下格式（（''，''，''，... 83个条目），\ （''，''，''，... 83个条目））

但是我不知道如何将列表数据转换为记录或者，或者，如何直接将excel数据读入DF可附加格式

tbl_tst.open(mode=dbf.READ_WRITE) # all fields character string

for everyrow in ws_IntMstDBF.iter_rows(min_row = 3, max_row = 3, max_col = ws_IntMstDBF.max_column-1):
    datum = [] #set([83]), will defining datum as () help solve the problem?
    for idx, cells in enumerate(everyrow):
        if cells.value is None: # for None entries, enter empty string
            datum.append("")
            continue
        datum.append(cells.value) # else enter cell values 

     tbl_tst.append(datum) # append that record to table !!! list is not record error here

tbl_tst.close()

该错误抱怨使用列表将其追加到表，这应该是一条记录等。请指导如何将excel行转换为可追加的DBF表数据。

raise TypeError("data to append must be a tuple, dict, record, or template; not a %r" % type(data))
TypeError: data to append must be a tuple, dict, record, or template; not a <class 'list'>

Answer 1

更改

tbl_tst.append(datum)

到

tbl_tst.append(tuple(datum))

，它将消除该错误。只要您所有的单元格数据都具有适当的类型，那么附加元素就应该起作用。

Answer 2

感谢您的回应，自昨晚以来我一直在尝试不同的解决方案时保持切线。

一个对我有用的解决方案如下：我确保我使用的工作表数据是所有字符串/文本，并将所有空条目转换为字符串类型并输入了空字符串。因此，以下代码可以完成此任务：

#house keeping
for eachrow in ws_IntMstDBF.iter_rows(min_row=2, max_row=ws_IntMstDBF.max_row, max_col=ws_IntMstDBF.max_column):
    for idx, cells in enumerate(eachrow):
        if cells.value is None: # change every Null cell type to String and put 0x20 (space)
            cells.data_type = 's'
            cells.value = " "

写完工作表后，我使用panda数据框将其重新打开，并验证内容是否全部为字符串类型，并且数据框中是否没有“ nan”值。然后，我使用了“ Dani Arribas-Bel”中的df2dbf函数，对其进行了修改以适合我正在使用的数据并转换为dbf。

导入数据帧并转换为dbf格式的代码如下：

abspath = Path(__file__).resolve() # resolve to relative path to absolute
rootpath = abspath.parents[3] # root (my source file is3 sub directories deep
xlspath = rootpath / 'sub-dir1' / 'sub-dir2' / 'sub-dir3' / 'test.xlsx'
# above code is only resolving file location, ignore 
pd_Mst_df = pd.read_excel(xlspath)
#print(pd_Mst_df) # for debug 
print("... Writing Master DBF file ")
df2dbf(pd_Mst_df, dbfpath) # dbf path is defined similar to pd_Mst path

函数df2dbg使用pysal以dbf格式写入数据帧：我对代码进行了一些修改，以检测长度行长度和字符类型，如下所示：

import pandas as pd
import pysal as ps
import numpy as np

# code from function df2dbf
else:
    type2spec = {int: ('N', 20, 0),
                 np.int64: ('N', 20, 0),
                 float: ('N', 36, 15),
                 np.float64: ('N', 36, 15),
                 str: ('C', 200, 0)
                 }
    #types = [type(df[i].iloc[0]) for i in df.columns]
    types = [type('C') for i in range(0, len(df.columns))] #84)] #df.columns)] #range(0,84)] # i not required, to be removed
    specs = [type2spec[t] for t in types]
db = ps.open(dbf_path, 'w')
# code continues from function df2dbf

Pandas数据框不需要进一步修改，因为所有源数据在提交给excel文件之前都已正确格式化。

一旦在stackoverflow上找到pysal和df2dbf，我将提供其链接。

Answer 3

查看Python Pandas库...

要从excel数据中读取熊猫数据框，可以使用pandas.read_excel

将日期读取到Pandas数据框中后，您可以对其进行操作，然后使用pandas.DataFrame.to_sql

将其写入数据库。

See also this explanation for dealing with database io

将数据从Excel工作表（openpyxl）传输到数据库表（dbf）

3 个答案: