如何在熊猫上进行关键字随机播放

时间:2019-01-28 07:31:14

标签: pandas dataframe

我想延长扩展名,使bale calecale bale的含义相同,所有关键字都在字符串上

这是我的数据集

Keyword         Category_1 Category_2 Category_3
ale bale cale   bale       cale       cale
bale cale       cale       cale       ale

这就是我想要的

Keyword         Category_1 Category_2 Category_3
ale bale cale   bale       cale       cale
ale cale bale   bale       cale       cale
bale ale cale   bale       cale       cale
bale cale ale   bale       cale       cale
cale ale bale   bale       cale       cale
cale bale ale   bale       cale       cale
bale cale       cale       cale       ale
cale bale       cale       cale       ale

1 个答案:

答案 0 :(得分:2)

itertools.permutations与拆分值和列表列表理解一起使用,然后按空格将值连接在一起,并将索引值添加到助手DataFrame-df1中。最后join个原始DataFrame:

from itertools import permutations

L = [(' '.join(y), k) for k, v in df['Keyword'].items() for y in permutations(v.split())]
df1 = pd.DataFrame(L, columns=['Keyword','idx']).set_index('idx')
print (df1)
           Keyword
idx               
0    ale bale cale
0    ale cale bale
0    bale ale cale
0    bale cale ale
0    cale ale bale
0    cale bale ale
1        bale cale
1        cale bale

df1的另一种解决方案:

vals, idx = list(zip(*L))
df1 = pd.DataFrame({'Keyword':vals}, index=idx).rename_axis('idx')

df = df1.join(df.drop('Keyword',axis=1), on='idx').reset_index(drop=True)
print (df)
         Keyword Category_1 Category_2 Category_3
0  ale bale cale       bale       cale       cale
1  ale cale bale       bale       cale       cale
2  bale ale cale       bale       cale       cale
3  bale cale ale       bale       cale       cale
4  cale ale bale       bale       cale       cale
5  cale bale ale       bale       cale       cale
6      bale cale       cale       cale        ale
7      cale bale       cale       cale        ale