我想在我的数据框中创建一个列,用于检查一列中的值是否是包含字典键的另一个列的字典值,如下所示:
In [3]:
df = pd.DataFrame({'Model': ['Corolla', 'Civic', 'Accord', 'F-150'],
'Make': ['Toyota', 'Honda', 'Toyota', 'Ford']})
dic = {'Prius':'Toyota', 'Corolla':'Toyota', 'Civic':'Honda',
'Accord':'Honda', 'Odyssey':'Honda', 'F-150':'Ford',
'F-250':'Ford', 'F-350':'Ford'}
df
Out [3]:
Model Make
0 Corolla Toyota
1 Civic Honda
2 Accord Toyota
3 F-150 Ford
在应用函数或其他任何函数后,我想看看:
Out [10]:
Model Make match
0 Corolla Toyota TRUE
1 Civic Honda TRUE
2 Accord Toyota FALSE
3 F-150 Ford TRUE
提前致谢!
编辑:我尝试创建一个传递一个元组的函数,这个函数将是两列,但我认为我没有正确传递参数:
def is_match(make, model):
try:
has_item = dic[make] == model
except KeyError:
has_item = False
return(has_item)
df[['Model', 'Make']].apply(is_match)
results in:
TypeError: ("is_match() missing 1 required positional
argument: 'model'", 'occurred at index Model')
答案 0 :(得分:5)
您可以使用map
df.assign(match=df.Model.map(dic).eq(df.Make))
Out[129]:
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True
答案 1 :(得分:3)
func completeOffset(from date:Date) -> String? {
let formatter = DateComponentsFormatter()
formatter.unitsStyle = .brief
return formatter.string(from: Calendar.current.dateComponents([.year,.month,.day,.hour,.minute,.second], from: date, to: self))
}
df.assign(match=[dic.get(md, '') == mk for mk, md in df.values])
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True
和dict.items
in
items = dic.items()
df.assign(match=[t[::-1] in items for t in map(tuple, df.values)])
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True
isin
df.assign(match=pd.Series(list(map(tuple, df.values[:, ::-1]))).isin(dic.items()))
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True
@ wen的方法要好一个数量级!
dtype = [('Make', '<U6'), ('Model', '<U7')]
a = np.array([tuple(r) for r in df.values], dtype)
b = np.array(list(dic.items()), dtype[::-1])
df.assign(match=np.in1d(a, b))
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True
def wen(df, dic):
return df.assign(match=df.Model.map(dic).eq(df.Make))
def maxu(df, dic):
return df.assign(match=df[['Make', 'Model']].sum(axis=1).isin(set([v+k for k, v in dic.items()])))
def pir1(df, dic):
return df.assign(match=[dic.get(md, '') == mk for mk, md in df.values])
def pir2(df, dic):
items = dic.items()
return df.assign(match=[t[::-1] in items for t in map(tuple, df.values)])
def pir3(df, dic):
return df.assign(match=pd.Series(list(map(tuple, df.values[:, ::-1]))).isin(dic.items()))
def pir4(df, dic):
dtype = [('Make', '<U6'), ('Model', '<U7')]
a = np.array([tuple(r) for r in df.values], dtype)
b = np.array(list(dic.items()), dtype[::-1])
return df.assign(match=np.in1d(a, b))
res = pd.DataFrame(
np.nan, [10, 30, 100, 300, 1000, 3000, 10000, 30000],
'wen maxu pir1 pir2 pir3 pir4'.split()
)
for i in res.index:
m = dict(dic.items())
d = pd.concat([df] * i, ignore_index=True)
for j in res.columns:
stmt = f'{j}(d, m)'
setp = f'from __main__ import {j}, m, d'
res.at[i, j] = timeit(stmt, setp, number=200)
res.plot(loglog=True)
答案 2 :(得分:2)
又一个选择:
In [38]: df['match'] = df[['Make','Model']] \
.sum(axis=1) \
.isin(set([v+k for k,v in dic.items()]))
In [39]: df
Out[39]:
Make Model match
0 Toyota Corolla True
1 Honda Civic True
2 Toyota Accord False
3 Ford F-150 True