我想说我的数据框中有一个特定的列。有些字段只包含1个值,有些甚至只有10.我决定将列值拆分为&#39 ;;'分离器。
data['golden_globes_nominee_categories'].str.split(';')
之后我按行迭代:
for index, row in data.iterrows():
print (row['golden_globes_nominee_categories'])
得到了这个:
['Best Original Song - Motion Picture ', ' Best Performance by an Actor in a Motion Picture - Comedy or Musical']
['Best Original Score - Motion Picture ', ' Best Performance by an Actress in a Motion Picture - Drama']
...
然后我循环遍历每个元素:
for index, row in data.iterrows():
for x in row['golden_globes_nominee_categories']:
但现在我真的对如何为每个特定值创建列感兴趣,这些值将包含数字(1或0),如果在单元格中提到它,它将显示给我?
基本上我想做这样的事情:
dataframe["time_sp_comp2"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==2 else 0)
dataframe["time_sp_comp3"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==3 else 0)
dataframe["time_sp_comp4"] = dataframe["time_spend_company"].apply(lambda x: 1 if x==4 else 0)
dataframe.drop('time_spend_company', axis=1, inplace=True)
答案 0 :(得分:1)
我认为这就是你追求的目标。
df = pd.DataFrame({'name': ['Jack', 'Jill', 'Chad'] ,
'tags': ['tall;rich;handsome',
'short;rich;pretty',
'tall']})
df
name tags
0 Jack tall;rich;handsome
1 Jill short;rich;pretty
2 Chad tall
pd.get_dummies
执行此操作)result = pd.DataFrame({k:1 for k in t}
for t in df.tags.str.split(';')).fillna(0).astype(int)
result
handsome pretty rich short tall
0 1 0 1 0 1
1 0 1 1 1 0
2 0 0 0 0 1
pd.concat([df['name'], result], axis=1)
name handsome pretty rich short tall
0 Jack 1 0 1 0 1
1 Jill 0 1 1 1 0
2 Chad 0 0 0 0 1