我知道这个问题已经被问过多次了,但是在将其标记为重复之前,我发现的所有答案似乎都没有用。我有一个格式为
的数据框 category | description
------------------------------
puppy dog$pup
crappy cat$pet
squeeky animal
fluffy dog$pet
我想用description
符号将$
列拆分为多行,并获得如下所示的内容:
category | description
------------------------------
puppy dog
puppy pup
crappy cat
crappy pet
squeeky animal
fluffy dog
fluffy pet
抱歉,这个愚蠢的例子,但我希望它能说明问题。我尝试的最后一件事是:
new_df = pd.concat([pd.Series(row['category'], row['description'].split('$'))
for _, row in old_df.iterrows()]).reset_index()
但这会返回一个:
AttributeError: 'float' object has no attribute 'split'.
答案 0 :(得分:3)
我认为缺少值存在问题,因此最好使用Series.str.split
,然后使用DataFrame.explode
来创建新行(适用于0.25+的熊猫):
df['description'] = df['description'].str.split('$')
df = df.explode('description')
print (df)
category description
0 puppy dog
0 puppy pup
1 crappy cat
1 crappy pet
2 squeeky animal
3 fluffy dog
3 fluffy pet
4 another val NaN
答案 1 :(得分:1)
对于0.25之前的熊猫来说,认为这里的一种方法是将apply
从一列拆分为两列,然后使用melt
来重组所需结构中的数据。
import pandas as pd
data = [{ "category": "puppy", "description": "dog$pup"},
{ "category": "crappy", "description": "cat$pet"},
{ "category": "squeeky", "description": "animal"},
{ "category": "fluffy", "description": "dog$pet"},
]
data_df = pd.DataFrame(data)
data_df["one"], data_df["two"] = zip(*[r[0:2] for r in data_df['description'].apply(lambda x: x.split("$")+ [None] ) ])
data_df[['category','one','two']].melt(id_vars="category")[['category','variable']].sort_values(by=["category", "variable"])