我正在使用当前的数据框:
df = pd.DataFrame({'columnA':[1111,1111,2222,3333,4444,4444,5555,6666],
'columnB':['AAAA','AAAA','BBBB','AAAA','BBBB','BBBB','AAAA','BBBB'],
'columnC':['one','two','one','one','one','sales','two','one'],
'NUM1':[1,3,5,7,1,0,4,5],
'NUM2':[5,3,6,9,2,4,1,1],
'W':list('aaabbbbb')})
我试图在以下代码中使用动态列:
#First aggregate the data
d = {'columnB':'unique', 'columnC':'unique' }
df2 = df.groupby('columnA').agg(d)
#Convert list to string for each cell of the inventory field
mylist = ["columnB","columnC"]
for x in mylist:
columnName = x
#print("df2."+columnName+".apply(', '.join)")
df2[columnName] = df2[columnName].apply(', '.join)
它在Jupyter中运行良好。我的问题是,当我在visualstudio上运行它时,它不起作用。我收到了这个错误:
打印数据框的类型后,我得到了这个:序列项0:预期的str实例,浮点数
<class 'pandas.core.frame.DataFrame'>
以下是完整的错误消息:
回溯(最近一次呼叫最后一次):文件&#34; stage1.py&#34;,第112行,在main()文件&#34; stage1.py&#34;,第57行,在主模板方案[columnName] ] = templateScenarios [columnName] .apply(&#39;,&#39; .join)File&#34; /Users/apolo.siskos/anaconda3/lib/python3.6/site-packages/pandas/core/series .py&#34;,第2355行,在pandas._libs中应用mapped = lib.map_infer(values,f,convert = convert_dtype)文件&#34; pandas / _libs / src / inference.pyx&#34;,第1574行.lib.map_infer TypeError:sequence item 0:expect str instance,float found
答案 0 :(得分:1)
存在问题NaN
的值,因此可以dropna
删除它们并使用自定义函数join
:
df = pd.DataFrame({'columnA':[1111,1111,2222,3333,4444,4444,5555,6666],
'columnB':[np.nan,np.nan,'BBBB','AAAA','BBBB','BBBB','AAAA','BBBB'],
'columnC':['one','two','one','one','one','sales','two','one'],
'NUM1':[1,3,5,7,1,0,4,5],
'NUM2':[5,3,6,9,2,4,1,1],
'W':list('aaabbbbb')})
f = lambda x: ', '.join(x.dropna().unique())
d = {'columnB': f, 'columnC':f}
df2 = df.groupby('columnA').agg(d)
print (df2)
columnB columnC
columnA
1111 one, two
2222 BBBB one
3333 AAAA one
4444 BBBB one, sales
5555 AAAA two
6666 BBBB one