Question

熊猫新手;有没有更好的方法呢？

import pandas as pd
import numpy as np
from StringIO import StringIO

devices = StringIO("""name;date;CPU;Freq;Voltage
RPI;201501;arm;700MHz;5V
Galileo;201501;intel;400MHz;3.3V
UNO;201502;atmel;16MHz;5V
""")
d = pd.DataFrame.from_csv(devices, sep=';', index_col=None)

comments = StringIO("""comment;t1;t2;t3
cool;arm;;
great!;atmel;;
great!;intel;5V;
fun;atmel;16MHz;
fun;700MHz;atmel;
""")
c = pd.DataFrame.from_csv(comments, sep=';', index_col=None)

n = d.copy()
n['cool'], n['great!'], n['fun'] = 0, 0, 0

for i, row in n.iterrows():
    for j, com in c.iterrows():
        if np.all(np.in1d(np.array(com[['t1', 't2', 't3']].dropna()), np.array(row))):
            n.loc[i, c.loc[j, 'comment']] = 1

最后，我构建了新的DataFrame n，它看起来像是：

    name    date    CPU     Freq    Voltage     cool    great!  fun
0   RPI     201501  arm     700MHz  5V          1       0       0
1   Galileo 201501  intel   400MHz  3.3V        0       0       0
2   UNO     201502  atmel   16MHz   5V          0       1       1

另一个df，d和c看起来像那样

    name    date    CPU     Freq    Voltage
0   RPI     201501  arm     700MHz  5V
1   Galileo 201501  intel   400MHz  3.3V
2   UNO     201502  atmel   16MHz   5V

    comment     t1      t2      t3
0   cool        arm     NaN     NaN
1   great!      atmel   NaN     NaN
2   great!      intel   5V      NaN
3   fun         atmel   16MHz   NaN
4   fun         700MHz  atmel   NaN

我必须使用2个循环来完成它。这打破了我对熊猫的梦想！还有什么更好的？必须遗漏一些东西..

Answer 1

c['val'] = 1

comments = pd.pivot_table(c,index='t1',columns='comment',
                            values='val',aggfunc=sum).fillna(0)

df = pd.merge(d,comments,left_on='CPU',right_index=True,how='left')

注释：

comment  cool  fun  great!
t1                        
700MHz      0    1       0
arm         1    0       0
atmel       0    1       1
intel       0    0       1

DF：

      name    date    CPU    Freq Voltage  cool  fun  great!
0      RPI  201501    arm  700MHz      5V     1    0       0
1  Galileo  201501  intel  400MHz    3.3V     0    0       1
2      UNO  201502  atmel   16MHz      5V     0    1       1

在Pandas中更好的方法，我有2个循环

1 个答案: