熊猫新手;有没有更好的方法呢?
import pandas as pd
import numpy as np
from StringIO import StringIO
devices = StringIO("""name;date;CPU;Freq;Voltage
RPI;201501;arm;700MHz;5V
Galileo;201501;intel;400MHz;3.3V
UNO;201502;atmel;16MHz;5V
""")
d = pd.DataFrame.from_csv(devices, sep=';', index_col=None)
comments = StringIO("""comment;t1;t2;t3
cool;arm;;
great!;atmel;;
great!;intel;5V;
fun;atmel;16MHz;
fun;700MHz;atmel;
""")
c = pd.DataFrame.from_csv(comments, sep=';', index_col=None)
n = d.copy()
n['cool'], n['great!'], n['fun'] = 0, 0, 0
for i, row in n.iterrows():
for j, com in c.iterrows():
if np.all(np.in1d(np.array(com[['t1', 't2', 't3']].dropna()), np.array(row))):
n.loc[i, c.loc[j, 'comment']] = 1
最后,我构建了新的DataFrame n,它看起来像是:
name date CPU Freq Voltage cool great! fun
0 RPI 201501 arm 700MHz 5V 1 0 0
1 Galileo 201501 intel 400MHz 3.3V 0 0 0
2 UNO 201502 atmel 16MHz 5V 0 1 1
另一个df,d和c看起来像那样
name date CPU Freq Voltage
0 RPI 201501 arm 700MHz 5V
1 Galileo 201501 intel 400MHz 3.3V
2 UNO 201502 atmel 16MHz 5V
comment t1 t2 t3
0 cool arm NaN NaN
1 great! atmel NaN NaN
2 great! intel 5V NaN
3 fun atmel 16MHz NaN
4 fun 700MHz atmel NaN
我必须使用2个循环来完成它。这打破了我对熊猫的梦想! 还有什么更好的?必须遗漏一些东西..
答案 0 :(得分:0)
c['val'] = 1
comments = pd.pivot_table(c,index='t1',columns='comment',
values='val',aggfunc=sum).fillna(0)
df = pd.merge(d,comments,left_on='CPU',right_index=True,how='left')
注释:
comment cool fun great!
t1
700MHz 0 1 0
arm 1 0 0
atmel 0 1 1
intel 0 0 1
DF:
name date CPU Freq Voltage cool fun great!
0 RPI 201501 arm 700MHz 5V 1 0 0
1 Galileo 201501 intel 400MHz 3.3V 0 0 1
2 UNO 201502 atmel 16MHz 5V 0 1 1