Question

我的具体问题是，是否有人可以识别为什么当我在下面运行此代码时出现此特定错误。或者更好的是，如何解决它。我正在尝试将df5中部门编号的部门描述映射到第二个数据框的（df2）TrueDepartment列。 Df2有一个称为“部门”的列，我想对其进行遍历，搜索包含4或5位dpt_nbrs的子字符串。 Dpt_Nbr在df5中从1升到10000以上，没有空白行。 df5中的每个Dept_Nbr都有一个Dept_Desc_HR，当在df2的Department列中找到子字符串（连续4或5个连续数字）时，我想将此Dept_Desc写入df2的TrueDepartment列。因此对于每个数据帧（df2有2列，df5有3列）。 df2有一个我想进行迭代的Deparment列和一个我要写入的TrueDepartment列。 df5具有3列，即Dept_Nbr，Dept_Desc_HR和Dept_Desc_AD。 df2的“部门”列具有许多空白单元格和许多具有值的单元格。其中一些值中没有数字，而另一些值中有几个数字，而某些单元格则由数字，字母和特殊字符组成。我想使用具有4或5个连续数字的单元格来标识dept_nbr，然后将该Dept_Nbr的dept_desc映射到df2的TrueDepartment列。如果Dept_Nbr在Dept_Desc_AD中有一个值，我想使用此值并将其写入df2的TrueDepartment列。如果在Dept_Desc_AD中没有值，我想将Dept_Desc_HD的内容写入df2的TrueDepartment列。我的代码适用于示例数据集，但使用完整的excelspreadsheet可以处理较大的数据集，这使我在底部看到错误。感谢您为解决此问题提供的帮助。如果需要，我很乐意提供电子表格或其他任何信息。谢谢

import pandas as pd
import numpy as np
import re

#reading my two data frames from 2 excel files
excel_file='/Users/j0t0174/anaconda3/Depts_sheets_withonlyAD_4columns.xlsx'  
df2 = pd.read_excel(excel_file)

excel_file='/Users/j0t0174/anaconda3/dept_nbr.xlsx'
df5=pd.read_excel(excel_file)

df2=df2.replace(np.nan, "Empty",regex=True)
df5=df5.replace(np.nan, "Empty",regex=True)

numbers = df5['Dept_Nbr'].tolist()#-->adding dept_nbr's to list
df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']]
df5 = df5.set_index('Dept_Nbr')  #<--setting data frame 5 (df5) to the new index

for n in numbers:
    for i in range(len(df5.index)):  #<--iterate through the number of elements not the elements themselves
        if str(n) == df2.loc[i, 'Department'][-4:]: #<-- convert n to str and slice df2 string for the last 4 chars
            if df5.loc[n, 'Dept_Desc_AD'] != "Empty":  #<--checking against a string, not a NaN
                df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_AD']  #<-- use .loc not .at
            else:
                df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_HD']


TypeError                                 Traceback (most recent call last)
<ipython-input-5-aa578c4c334c>     in <module>()
 17 for n in numbers:
 18     for i in range(len(df5.index)):  #<-- you want to iterate through the number of elements not the elements themselves 
---> 19         if str(n) == df2.loc[i, 'Department'][-4:]: #<-- convert n to str and slice df2 string for the last 4 chars
 20             if df5.loc[n, 'Dept_Desc_AD'] != "Empty":  #<-- you're actually checking against a string, not a NaN
 21                 df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_AD']  #<-- use .loc not .at

TypeError: 'int' object is not subscriptable

Answer 1

因为

引发了您的错误

df2.loc[i, 'Department']

返回不可下标的int。如果您想要该整数的后4位，请先使其为str

str(df2.loc[i, 'Department'])

和就可以下标

str(df2.loc[i, 'Department'])[-4:]

如何将int对象变成可下标的对象？

1 个答案: