将重复值插入Pandas行

时间:2017-07-26 15:26:28

标签: python pandas

我想分开乔布斯,史蒂夫。 01/31列,以便[SPGC-9456,6.0]]在它自己的行上。

我的代码现在输出的内容:

                                             2017-01-31           2017-02-01
   Gates, Bill.                             [[SPGC-14075, 0.5]]         NaN
   Jobs, Steve.           [[SPGC-14075, 3.5], [SPGC-9456, 6.0]]         NaN
   White, John ANDERSON.  [[SPGC-14075, 1.75]]              [[SPGC-9456, 1.75]]

我想要的是什么:

                                            2017-01-31           2017-02-01
 Gates, Bill.                             [[SPGC-14075, 0.5]]         NaN
 Jobs, Steve.                           [[SPGC-14075, 3.5]            NaN
 Jobs, Steve.                             [SPGC-9456, 6.0]]           NaN                  
 White, John ANDERSON.                   [[SPGC-14075, 1.75]]  [[SPGC-9456, 1.75]]

2 个答案:

答案 0 :(得分:2)

col = '2017-01-31'
v = df[col].values.tolist()
l = [len(x) for x in v]
d = {col: [[x] for y in v for x in y]}
df.reindex(df.index.repeat(l)).assign(**d)

                                 2017-01-31           2017-02-01
Gates, Bill.            [[SPGC-14075, 0.5]]                  NaN
Jobs, Steve.            [[SPGC-14075, 3.5]]                  NaN
Jobs, Steve.             [[SPGC-9456, 6.0]]                  NaN
White, John ANDERSON.  [[SPGC-14075, 1.75]]  [[SPGC-9456, 1.75]]

设置

df = pd.DataFrame([
        [[['SPGC-14075', .5]], np.nan],
        [[['SPGC-14075', 3.5], ['SPGC-9456', 6.]], np.nan],
        [[['SPGC-14075', 1.75]], [['SPGC-9456', 1.75]]]
    ], 
    'Gates, Bill.|Jobs, Steve.|White, John ANDERSON.'.split('|'),
    ['2017-01-31', '2017-02-01']
)

答案 1 :(得分:1)

我没有使用您的数据,您可以尝试使用我的临时数据。

Temp=pd.DataFrame({'Index':['str1', 'str2', 'str3'],'va':[['x'],[['y'],['z']],['z']],'va2':[np.nan,np.nan,['YY']]}).set_index('Index')
Temp_unnest = pd.DataFrame([[i, x]
              for i, y in Temp['va'].apply(list).iteritems()
                  for x in y], columns=list('IV'))
Temp_unnest['va2']=Temp_unnest.I.map(Temp.va2)
Temp_unnest.set_index('I',inplace=True)
Temp_unnest.columns=Temp.columns

Temp_unnest
Out[121]: 
       va   va2
I              
str1    x   NaN
str2  [y]   NaN
str2  [z]   NaN
str3    z  [YY]