我有2个数据框,其数据如下:
df1:
====
id name age likes
--- ----- ---- -----
0 A 21 rose
1 B 22 apple
2 C 30 grapes
4 D 21 lily
df2:
====
category Fruit Flower
--------- ------- -------
orange 1 0
apple 1 0
rose 0 1
lily 0 1
grapes 1 0
我想做的是在df1中添加另一列,其中将包含单词“ Fruit”或“ Flower”,具体取决于该条目在df2中的一次性编码。 我正在寻找一个纯粹的pandas / numpy实现。
任何帮助将不胜感激。
谢谢!
答案 0 :(得分:2)
您可以使用apply()
:
df1['type_string'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)
这是一个正在运行的示例:
import pandas as pd
from io import StringIO
df1 = pd.read_csv(StringIO(
"""
0 A 21 rose
1 B 22 apple
2 C 30 grapes
4 D 21 lily
"""), sep='\s+', header=None)
df2 = pd.read_csv(StringIO(
"""
orange 1 0
apple 1 0
rose 0 1
lily 0 1
grapes 1 0
"""), sep='\s+', header=None)
df1.columns = ['id', 'name', 'age', 'likes']
df2.columns = ['category', 'Fruit', 'Flower']
df1['category'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)
输入
id name age likes
0 0 A 21 rose
1 1 B 22 apple
2 2 C 30 grapes
3 4 D 21 lily
输出
id name age likes category
0 0 A 21 rose Fruit
1 1 B 22 apple Fruit
2 2 C 30 grapes Flower
3 4 D 21 lily Flower
答案 1 :(得分:0)
IIUC,您可以使用.apply并设置axis = 1或axis =“ columns”,这意味着将功能应用于每一行。
df3 = df1.merge(df2, left_on='likes', right_on='category')
# you can add your one hot columns in here.
categories_col = ['Fruit','Flower']
def get_category(x):
for category in categories_col:
if x[category] == 1:
return category
df1["new"] = df3.apply(get_category, axis=1)
print(df1)
id name age likes new
0 0 A 21 rose Flower
1 1 B 22 apple Fruit
2 2 C 30 grapes Fruit
3 4 D 21 lily Flower
但是请确保您的category_col数据框必须是一种热编码。
答案 2 :(得分:0)
诀窍在于两个表的行数不同,如果df2的类别比df1中的类别更多,则上面的示例也可能不起作用。
这是一个工作示例:
df1 = pd.DataFrame([['orange',12],['rose',3],['apple',44],['grapes',1]], columns = ['name', 'age'])
df1
name age
0 orange 12
1 rose 3
2 apple 44
3 grapes 1
df2 = pd.DataFrame([['orange',1],['rose',0],['apple',1],['grapes',1],['daffodils',0],['berries',1]], columns = ['cat', 'Fruit'])
df2
cat Fruit
0 orange 1
1 rose 0
2 apple 1
3 grapes 1
4 daffodils 0
5 berries 1
一行,运行一个带条件语句的listcomp,并在键df1.name = df2.cat的情况下即时执行合并的df1和df2:
df1['flag'] = ['Fruit' if i == 1 else 'Flower' for i in df1.merge(df2,how='left',left_on='name', right_on='cat').Fruit]
df1
输出
name age flag
0 orange 12 Fruit
1 rose 3 Flower
2 apple 44 Fruit
3 grapes 1 Fruit