熊猫:条件基于单元格中的列表

时间:2017-10-13 16:00:00

标签: python pandas

Dataframe看起来像这样(空白单元格是'',字段,extra_dimensions是列)

field | extra_dimensions
------------------------
a     | 
b     | [abc, def]
c     | [ghi]

我有一个所需尺寸和额外尺寸的列表:

required_dimensions = [123, 456]
extra_dimensions = [abc, def, ghi]

期望的输出:

field | 123 | 456 | abc | def | ghi
-----------------------------------
a     | 1   | 1   | 0   | 0   | 0
b     | 1   | 1   | 1   | 1   | 0
c     | 1   | 1   | 0   | 0   | 1

尝试:

columns = ['field', 'extra_dimensions'] + required_dimensions + extra_dimensions
df = df.reindex(columns=columns)
for i in required_dimensions:
    df[i].fillna('1', inplace=True)
for i in extra_dimensions:
    df[i][df['extra_dimensions'].str.contains(i)] = '1'

但我明白了:

ValueError: cannot index with vector containing NA / NaN values

会喜欢我尝试的任何意见或对更好方法的任何想法。提前谢谢!

1 个答案:

答案 0 :(得分:0)

再次使用get_dummies .....

required_dimensions = ['123', '456']
df=pd.DataFrame({'field':list('abc'),'extra_dimensions':[[],['abc','def'],['ghi']]})
df=pd.get_dummies(df.set_index('field')['extra_dimensions'].apply(pd.Series).stack()).sum(level=0).reindex(df.field).fillna(0)
d = dict.fromkeys(required_dimensions, 1)
df.assign(**d)

Out[283]: 
       abc  def  ghi  123  456
field                         
a      0.0  0.0  0.0    1    1
b      1.0  1.0  0.0    1    1
c      0.0  0.0  1.0    1    1