Question

我必须根据df B列中的子字符串从df A获取标签。

问题

有没有办法在不使用Loop的情况下执行此操作？

数据框A：

original string:

1. test1(arizona)     
2. NJtest2

数据框B：

keyword          Label

1. test1            First Cycle Test
2. test2            Second Cycle Test

输出：

Original         Target

1. test1(arizona)  First Cycle Test
2. NJtest2         Second Cycle Test

Answer 1

使用str.extract + merge：

df1
              Col
0  test1(arizona)
1         NJtest2

df2
  keyword              Label
0   test1   First Cycle Test
1   test2  Second Cycle Test

p = '(?P<Key>.*(?P<keyword>{}).*)'.format('|'.join(df2.keyword))

df1.Col.str.extract(p, expand=True)\
            .merge(df2).drop('keyword', 1)

              Key              Label
0  test1(arizona)   First Cycle Test
1         NJtest2  Second Cycle Test

正则表达式模式提取关键字以及完整字符串，这使得合并无痛。

Answer 2

def sum?(array, n) Set.new( array.reject do |v| v >= n end.combination(2).map do |a, b| a + b end ).include?(n) end + fuzzywuzzy

apply

Answer 3

我在这里有点困惑。您只想用原始字符串替换关键字列？那么这应该足够了：

df1 = pd.DataFrame({'original string': ['test1(arizona)', 'NJtest2']})
df2 = pd.DataFrame({'keyword': ['test1', 'test2','test3']
                    ,'label':['First Cycle Test','Second Cycle Test','Third Cycle Test']})

def func(x):
    find = [i for i in df1['original string'].tolist() if x in i]
    if find:
        return find[0]
    else:
        return "None"

df2.keyword = df2.keyword.apply(func)

df2 = df2.rename(columns=dict(keyword='Orginal',label='Target'))

返回：

          Orginal             Target
0  test1(arizona)   First Cycle Test
1         NJtest2  Second Cycle Test
2            None   Third Cycle Test

根据子字符串派生名称

3 个答案: