我必须根据df B
列中的子字符串从df A
获取标签。
问题
有没有办法在不使用Loop的情况下执行此操作?
数据框A:
original string:
1. test1(arizona)
2. NJtest2
数据框B:
keyword Label
1. test1 First Cycle Test
2. test2 Second Cycle Test
输出:
Original Target
1. test1(arizona) First Cycle Test
2. NJtest2 Second Cycle Test
答案 0 :(得分:1)
使用str.extract
+ merge
:
df1
Col
0 test1(arizona)
1 NJtest2
df2
keyword Label
0 test1 First Cycle Test
1 test2 Second Cycle Test
p = '(?P<Key>.*(?P<keyword>{}).*)'.format('|'.join(df2.keyword))
df1.Col.str.extract(p, expand=True)\
.merge(df2).drop('keyword', 1)
Key Label
0 test1(arizona) First Cycle Test
1 NJtest2 Second Cycle Test
正则表达式模式提取关键字以及完整字符串,这使得合并无痛。
答案 1 :(得分:1)
def sum?(array, n)
Set.new(
array.reject do |v|
v >= n
end.combination(2).map do |a, b|
a + b
end
).include?(n)
end
+ fuzzywuzzy
apply
答案 2 :(得分:0)
df1 = pd.DataFrame({'original string': ['test1(arizona)', 'NJtest2']})
df2 = pd.DataFrame({'keyword': ['test1', 'test2','test3']
,'label':['First Cycle Test','Second Cycle Test','Third Cycle Test']})
def func(x):
find = [i for i in df1['original string'].tolist() if x in i]
if find:
return find[0]
else:
return "None"
df2.keyword = df2.keyword.apply(func)
df2 = df2.rename(columns=dict(keyword='Orginal',label='Target'))
返回:
Orginal Target
0 test1(arizona) First Cycle Test
1 NJtest2 Second Cycle Test
2 None Third Cycle Test