Question

我正在使用pandas.read_excel（）从多个excel-workbooks导入特定单元格。在excel术语中，它正在读取第32行的部分，特别是C32：I32。循环将值作为一行附加到最初为空的DataFrame（df），前面是两个值（时间和区域），它们来自与文件名相同的元组。

以下解决方案正在运作：

df = pd.DataFrame()
for xfl, date, regio in xtpl:
    tmp = pd.read_excel(xfl, sheet_name='sheet_name', header=None)
    try:                                           
        df = df.append(pd.DataFrame([[int(date + regio),date, regio,
                      tmp.loc[31,2],tmp.loc[31,3],tmp.loc[31,4],
                      tmp.loc[31,5],tmp.loc[31,6],tmp.loc[31,7],
                      tmp.loc[31,8]]]), ignore_index=True)
    except KeyError as e:
        next

但我对自己的代码有两个问题：

1）为什么我不能使用切片？我尝试使用tmp.loc[31,2:9]代替tmp.loc[31,2], tmp.loc[31,3] etc。但这导致了错误的结果（并非所有值都在df中正确）。

问题似乎是，tmp.loc[31,2:9]存储为pandas.Series !? 因此，当将切片（系列）分配给pandas.DataFrame时，旧索引将与值一起被复制。这导致了奇怪的结果。

我做了一个解决方法，使用列表（l1）启用切片，这远非优雅：

try:
    print(date, regio)
    l1 = [int(date + regio), date, regio]
    l1 += ([tmp.loc[15,x] for x in range(2,9)])
    l1 += ([tmp.loc[31,x] for x in range(2,9)])
    l1 += ([tmp.loc[32,x] for x in range(2,9)])
    df = d.append(pd.DataFrame([l1]), ignore_index=True)
    files += 1
except KeyError as e:
    continue

在追加行时是否有更好的方法来使用切片？

2）为什么它仅与.loc合作，而不与.iloc合作？使用tmp.iloc[31,2] etc.会导致IndexError（“IndexError：单个位置索引器超出范围”）。不应该是相反的方式，因为它是按位置读取excel表中的值吗？

将Excel单元格与其他值一起附加到pandas.DataFrame（）

0 个答案: