用.isin()熊猫测试的列中的替代值(python)

时间:2019-11-28 10:12:14

标签: python pandas dataframe

考虑两个数据框:

df1 = pd.DataFrame(['apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names'])
df2 = pd.DataFrame([['apple','red'],['banana','yellow'],['cherry','black']],columns=['fruits','colors'])

然后输入代码:

colors =[]
for f in df1.fruits_names.str.split().apply(set):   #convert content in a set with splitted words

    color = [df2[df2['fruits'].isin(f)]['colors']]  #matching fruits in a list
    colors.append(color)

我可以轻松地将颜色插入df1

df1['color'] = colors

output:
                    fruits_names            color
0  apple and banana are sweet fruits  [[red, yellow]]
1            how fresh is the banana       [[yellow]]
2                  cherry from japan        [[black]]

问题是“水果”列是否具有替代值,例如:

df2 = pd.DataFrame([[['green apple|opal apple'],'red'],[['banana|cavendish banana'],'yellow'],['cherry','black']],columns=['fruits','colors'])

如何保持此代码正常工作?

我最后一次尝试的是创建一个新的列,其中包含水果的单独值:

df2['Types'] = cf['fruits'].str.split('|')

和.apply(元组)在这里:

color = [df[df['Types'].apply(tuple).isin(f)]['colors']]

但是不匹配。

2 个答案:

答案 0 :(得分:1)

我认为您需要:

git clone -b <branchName> http:<projecturl>

使用print(df1) fruits_names 0 green apple and banana are sweet fruits 1 how fresh is the banana 2 cherry and opal apple from japan split

df.explode()

输出:

df2["fruits"] = df2["fruits"].apply(lambda x: x.split("|"))

df2 = df2.explode("fruits")

print(df2)

将其转换为 fruits colors 0 green apple red 0 opal apple red 1 banana yellow 1 cavendish banana yellow 2 cherry black

dict

根据条件创建列

d = {i:j for i,j in zip(df2["fruits"].values, df2["colors"].values)}

最终输出:

df1["colors"] = [[v for k,v in d.items() if k in x] for x in df1["fruits_names"]]

print(df1)

答案 1 :(得分:1)

import pandas as pd
import numpy as np
df1 = pd.DataFrame(['green apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names'])
df2 = pd.DataFrame([['green apple|opal apple','red'],['banana|cavendish banana','yellow'],['cherry','black']],columns=['fruits','colors'])
df2['sep_colors'] = np.where(df2['fruits'], (df2['fruits'].str.split(pat='|')), df2['fruits'])


dic = dict(zip(df2['colors'].tolist(),df2['sep_colors'].tolist()))

final = []
for row in range(len(df1.fruits_names)):
    list1 = []
    for key, value in dic.items():
        for item in value:
            if item in df1.iloc[row][0]:
                list1.append(key)
    final.append(list1)

df1['colors'] = final
相关问题