我有一个以下列方式构建的DataFrame:
Title; Total Visits; Rank
The dog; 8 ; 4
The cat; 9 ; 4
The dog cat; 10 ; 3
第二个DataFrame包含:
Keyword; Rank
snail ; 5
dog ; 1
cat ; 2
我想要完成的是:
Title; Total Visits; Rank ; Keywords ; Score
The dog; 8 ; 4 ; dog ; 1
The cat; 9 ; 4 ; cat ; 2
The dog cat; 10 ; 3 ; dog,cat ; 1.5
我使用了following reference,但是对于某些
df['Tweet'].map(lambda x: tuple(re.findall(r'({})'.format('|'.join(w.values)), x)))
返回null。任何帮助将不胜感激。
答案 0 :(得分:1)
您可以使用:
#create list of all words
wants = df2.Keyword.tolist()
#dict for maping
d = df2.set_index('Keyword')['Rank'].to_dict()
#split all values by whitespaces, create series
s = df1.Title.str.split(expand=True).stack()
#filter by list wants
s = s[s.isin(wants)]
print (s)
0 1 dog
1 1 cat
2 1 dog
2 cat
dtype: object
#create new columns
df1['Keywords'] = s.groupby(level=0).apply(','.join)
df1['Score'] = s.map(d).groupby(level=0).mean()
print (df1)
Title Total Visits Rank Keywords Score
0 The dog 8 4 dog 1.0
1 The cat 9 4 cat 2.0
2 The dog cat 10 3 dog,cat 1.5
列表操作的另一种解决方案:
wants = df2.Keyword.tolist()
d = df2.set_index('Keyword')['Rank'].to_dict()
#create list from each value
df1['Keywords'] = df1.Title.str.split()
#remove unnecessary words
df1['Keywords'] = df1.Keywords.apply(lambda x: [item for item in x if item in wants])
#maping each word
df1['Score'] = df1.Keywords.apply(lambda x: [d[item] for item in x])
#create ne columns
df1['Keywords'] = df1.Keywords.apply(','.join)
#mean
df1['Score'] = df1.Score.apply(lambda l: sum(l) / float(len(l)))
print (df1)
Title Total Visits Rank Keywords Score
0 The dog 8 4 dog 1.0
1 The cat 9 4 cat 2.0
2 The dog cat 10 3 dog,cat 1.5
<强>计时强>:
In [96]: %timeit (a(df11, df22))
100 loops, best of 3: 3.71 ms per loop
In [97]: %timeit (b(df1, df2))
100 loops, best of 3: 2.55 ms per loop
测试代码:
df11 = df1.copy()
df22 = df2.copy()
def a(df1, df2):
wants = df2.Keyword.tolist()
d = df2.set_index('Keyword')['Rank'].to_dict()
s = df1.Title.str.split(expand=True).stack()
s = s[s.isin(wants)]
df1['Keywords'] = s.groupby(level=0).apply(','.join)
df1['Score'] = s.map(d).groupby(level=0).mean()
return (df1)
def b(df1,df2):
wants = df2.Keyword.tolist()
d = df2.set_index('Keyword')['Rank'].to_dict()
df1['Keywords'] = df1.Title.str.split()
df1['Keywords'] = df1.Keywords.apply(lambda x: [item for item in x if item in wants])
df1['Score'] = df1.Keywords.apply(lambda x: [d[item] for item in x])
df1['Keywords'] = df1.Keywords.apply(','.join)
df1['Score'] = df1.Score.apply(lambda l: sum(l) / float(len(l)))
return (df1)
print (a(df11, df22))
print (b(df1, df2))
通过评论编辑:
如果list comprhension
包含更多单词,则可以应用Keywords
:
print (df1)
Title Total Visits Rank
0 The dog 8 4
1 The cat 9 4
2 The dog cat 10 3
print (df2)
Keyword Rank
0 snail 5
1 dog 1
2 cat 2
3 The dog 8
4 the Dog 1
5 The Dog 3
wants = df2.Keyword.tolist()
print (wants)
['snail', 'dog', 'cat', 'The dog', 'the Dog', 'The Dog']
d = df2.set_index('Keyword')['Rank'].to_dict()
df1['Keywords'] = df1.Title.apply(lambda x: [item for item in wants if item in x])
df1['Score'] = df1.Keywords.apply(lambda x: [d[item] for item in x])
df1['Keywords'] = df1.Keywords.apply(','.join)
df1['Score'] = df1.Score.apply(lambda l: sum(l) / float(len(l)))
print (df1)
Title Total Visits Rank Keywords Score
0 The dog 8 4 dog,The dog 4.500000
1 The cat 9 4 cat 2.000000
2 The dog cat 10 3 dog,cat,The dog 3.666667