Question

我想在现有数据框中填充另一列。该列将包含字典的值。这些值是令牌列表。到目前为止，它不起作用，是否有办法将它们添加到数据框中？

df = pd.read_csv(sys.argv[1], na_values=['no info', '.'], encoding='Cp1252', delimiter=';')
s = pd.DataFrame(dict1).T.reset_index()
print(s)
#result

''' 
     index      0      1     2
0      231   2470  11854  2368
1      236   3132  11130  1236
2      237   4527  14593  1924
3      238   6167   8222  1070
'''
s.columns = ['number','grade1','grade2','grade3']
print(s.head())

#result
''' 
     number  grade1 grade2 grade3     
0      231   2470  11854  2368
1      236   3132  11130  1236
2      237   4527  14593  1924
3      238   6167   8222  1070
'''

df=pd.concat([df,s],axis=1)
print(df)
#result
'''
        id  ...        grade3
0      231  ...        2368
1      236  ...        1236
'''
#Filling to excel file
df.to_excel('exit_test2.xlsx')

#filling a new column with a list of tokens for each cell. The key of two dict are the same so I just need the list of tokens

df['tokens'] = ' '
for k,v in dict2.items():
    df.at[int(k), 'tokens'] = v 
print(df)
#have error

回溯错误

  File "Script_JDM_sans_sens.py", line 101, in <module>
    df.at[int(k), 'tokens'] = v  #change -1 for verbatim
  File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2287, in __setitem__
    self.obj._set_value(*key, takeable=self._takeable)
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2823, in _set_value
    self.loc[index, col] = value
  File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 190, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 611, in _setitem_with_indexer
    raise ValueError('Must have equal len keys and value '
ValueError: Must have equal len keys and value when setting with an iterable

#Display of dict2

dict2 = {'231': ['look','eat','at'], '236': ['lay','good', 'fun'], ….}

如何解决此错误？

Answer 1

您可以将dict2转换为熊猫的Series对象，然后将该系列添加为DataFrame df的一列。

将dict2转换为系列：

s = pd.Series(dict2)

如果df中的'id'列的类型为int，则必须将系列索引转换为int：

s.index = s.index.astype(int)

然后，将df的索引设置为与Series的索引相同：

df.set_index('id', inplace=True)

最后添加“令牌”列：

df['token'] = s

这是结果：

        grade1  grade2  grade3  token
id              
231     2470    11854   2368    [look, eat, at]
236     3132    11130   1236    [lay, good, fun]
237     4527    14593   1924    NaN
238     6167    8222    1070    NaN

Answer 2

您需要将df的索引设置为列id。当前是rangeindex。试试这个

df['tokens'] = ' '
df = df.set_index('id')
for k,v in dict2.items():
    df.at[int(k), 'tokens'] = v

Python初学者：提高ValueError（'必须具有相等的len键和值'）

2 个答案: