我想在现有数据框中填充另一列。该列将包含字典的值。这些值是令牌列表。到目前为止,它不起作用,是否有办法将它们添加到数据框中?
df = pd.read_csv(sys.argv[1], na_values=['no info', '.'], encoding='Cp1252', delimiter=';')
s = pd.DataFrame(dict1).T.reset_index()
print(s)
#result
'''
index 0 1 2
0 231 2470 11854 2368
1 236 3132 11130 1236
2 237 4527 14593 1924
3 238 6167 8222 1070
'''
s.columns = ['number','grade1','grade2','grade3']
print(s.head())
#result
'''
number grade1 grade2 grade3
0 231 2470 11854 2368
1 236 3132 11130 1236
2 237 4527 14593 1924
3 238 6167 8222 1070
'''
df=pd.concat([df,s],axis=1)
print(df)
#result
'''
id ... grade3
0 231 ... 2368
1 236 ... 1236
'''
#Filling to excel file
df.to_excel('exit_test2.xlsx')
#filling a new column with a list of tokens for each cell. The key of two dict are the same so I just need the list of tokens
df['tokens'] = ' '
for k,v in dict2.items():
df.at[int(k), 'tokens'] = v
print(df)
#have error
回溯错误
File "Script_JDM_sans_sens.py", line 101, in <module>
df.at[int(k), 'tokens'] = v #change -1 for verbatim
File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2287, in __setitem__
self.obj._set_value(*key, takeable=self._takeable)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2823, in _set_value
self.loc[index, col] = value
File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 190, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 611, in _setitem_with_indexer
raise ValueError('Must have equal len keys and value '
ValueError: Must have equal len keys and value when setting with an iterable
#Display of dict2
dict2 = {'231': ['look','eat','at'], '236': ['lay','good', 'fun'], ….}
如何解决此错误?
答案 0 :(得分:0)
您可以将dict2转换为熊猫的Series
对象,然后将该系列添加为DataFrame df的一列。
将dict2转换为系列:
s = pd.Series(dict2)
如果df中的'id'列的类型为int
,则必须将系列索引转换为int
:
s.index = s.index.astype(int)
然后,将df的索引设置为与Series的索引相同:
df.set_index('id', inplace=True)
最后添加“令牌”列:
df['token'] = s
这是结果:
grade1 grade2 grade3 token
id
231 2470 11854 2368 [look, eat, at]
236 3132 11130 1236 [lay, good, fun]
237 4527 14593 1924 NaN
238 6167 8222 1070 NaN
答案 1 :(得分:0)
您需要将df
的索引设置为列id
。当前是rangeindex
。试试这个
df['tokens'] = ' '
df = df.set_index('id')
for k,v in dict2.items():
df.at[int(k), 'tokens'] = v