根据$符号将熊猫行拆分为多行

时间:2019-10-09 09:26:36

标签: pandas dataframe pandas-groupby

我知道这个问题已经被问过多次了,但是在将其标记为重复之前,我发现的所有答案似乎都没有用。我有一个格式为

的数据框
   category     |     description
   ------------------------------
    puppy              dog$pup
    crappy             cat$pet
    squeeky            animal
    fluffy             dog$pet

我想用description符号将$列拆分为多行,并获得如下所示的内容:

   category     |     description
   ------------------------------
    puppy              dog
    puppy              pup
    crappy             cat
    crappy             pet
    squeeky            animal
    fluffy             dog
    fluffy             pet

抱歉,这个愚蠢的例子,但我希望它能说明问题。我尝试的最后一件事是:

new_df = pd.concat([pd.Series(row['category'], row['description'].split('$'))              
                    for _, row in old_df.iterrows()]).reset_index()

但这会返回一个:

AttributeError: 'float' object has no attribute 'split'.

2 个答案:

答案 0 :(得分:3)

我认为缺少值存在问题,因此最好使用Series.str.split,然后使用DataFrame.explode来创建新行(适用于0.25+的熊猫):

df['description'] = df['description'].str.split('$')
df = df.explode('description')
print (df)
      category description
0        puppy         dog
0        puppy         pup
1       crappy         cat
1       crappy         pet
2      squeeky      animal
3       fluffy         dog
3       fluffy         pet
4  another val         NaN

答案 1 :(得分:1)

对于0.25之前的熊猫来说,认为这里的一种方法是将apply从一列拆分为两列,然后使用melt来重组所需结构中的数据。

import pandas as pd
data = [{ "category": "puppy", "description": "dog$pup"},
 { "category": "crappy", "description": "cat$pet"},
 { "category": "squeeky", "description": "animal"},
 { "category": "fluffy", "description": "dog$pet"},
]

data_df = pd.DataFrame(data)
data_df["one"], data_df["two"] = zip(*[r[0:2] for r in data_df['description'].apply(lambda x: x.split("$")+ [None] ) ])

data_df[['category','one','two']].melt(id_vars="category")[['category','variable']].sort_values(by=["category", "variable"])

DataFrame Screenshot