嗨,我有2个非常大的CSV文件
df1
x y z keywords
a b c [apple,iphone,watch,newdevice]
e w q NaN
w r t [pixel,google]
s t q [india,computer]
d j o [google,apple]
df2
name stockcode
apple.inc appl
lg.inc weew
htc.inc rrr
google.com ggle
现在我需要检查df1中的m个值是否与df2中的新值匹配,我需要将新值的详细信息组合到df1中,否则我们需要填充空值
我需要使用python,请帮助我
样本输出
x y z keywords stockcode
a b c [apple,iphone,watch,newdevice] aapl
e w q NaN null
w r t [pixel,google,] ggle
s t q [india,computer] null
d j o [google,apple] aapl,ggle
我已经编写了这段代码,但是它只是比较一个关键字并给出一个股票代码,如果我们有2个在df2中匹配的关键字,我就需要2个股票代码
df1['stockcode'] = np.nan
#mapping data
for indexKW,valueKW in df1.keyword.iteritems():
for innerVal in valueKW.split():
for indexName, valueName in df2['Name'].iteritems():
for outerVal in valueName.split():
if outerVal.lower() == innerVal.lower():
df1['stockcode'].loc[indexKW] = df2.Identifier.loc[indexName]
上述程序的输出
x y z keywords stockcode
a b c [apple,iphone,watch,newdevice] aapl
e w q NaN null
w r t [pixel,google,] ggle
s t q [india,computer] null
d j o [google,apple] ggle
对于最后一行,我有2个在df2中匹配的关键字,但是我只得到一个与关键字google匹配的股票代码,我也需要获取苹果的股票代码,如示例输出所示。
示例输出:-
x y z keywords stockcode
a b c [apple,iphone,watch,newdevice] aapl
e w q NaN null
w r t [pixel,google,] ggle
s t q [india,computer] null
d j o [google,apple] aapl,ggle
请帮助我
答案 0 :(得分:2)
您可以将df2转换为查找字典,然后将其映射到df1;)
import numpy as np
import pandas as pd
data1 = {'x':'a,e,w'.split(','),
'keywords':['apple,iphone,watch,newdevice'.split(','),
np.nan,
'pixel,google'.split(',')]}
data2 = {'name':'apple lg htc google'.split(),
'stockcode':'appl weew rrr ggle'.split()}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
mapper = df2.set_index('name').to_dict()['stockcode']
df1['stockcode'] = df1['keywords'].replace(np.nan,'').apply(lambda x : [mapper[i] for i in x if (i and i in mapper.keys())])
df1['stockcode'] = df1['stockcode'].apply(lambda x: x[0] if x else np.nan)
答案 1 :(得分:0)
您可以将apply
和map
和join
配合使用:
df2.set_index('name',inplace=True)
df1.apply(lambda x: pd.Series(x['keywords']).map(df2['stockcode']).dropna().values,1)
0 [appl]
1 []
2 [ggle]
3 []
4 [ggle, appl]
dtype: object
或者:
df1.apply(lambda x: ','.join(pd.Series(x['keywords']).map(df2['stockcode']).dropna()),1)
0 appl
1
2 ggle
3
4 ggle,appl
dtype: object
或者:
df1.apply(lambda x: ','.join(pd.Series(x['keywords']).map(df2['stockcode']).dropna()),1)\
.replace('','null')
0 appl
1 null
2 ggle
3 null
4 ggle,appl
dtype: object
df1['stockcode'] = df1.apply(lambda x: ','.join(pd.Series(x['keywords'])\
.map(df2['stockcode']).dropna()),1)\
.replace('','null')
print(df1)
x y z keywords stockcode
0 a b c [apple, iphone, watch, newdevice] appl
1 e w q NaN null
2 w r t [pixel, google] ggle
3 s t q [india, computer] null
4 d j o [google, apple] ggle,appl