根据子字符串派生名称

时间:2017-11-20 22:10:47

标签: python pandas dataframe

我必须根据df B列中的子字符串从df A获取标签。

问题

有没有办法在不使用Loop的情况下执行此操作?

数据框A:

original string:

1. test1(arizona)     
2. NJtest2            

数据框B:

keyword          Label

1. test1            First Cycle Test
2. test2            Second Cycle Test

输出:

Original         Target

1. test1(arizona)  First Cycle Test
2. NJtest2         Second Cycle Test

3 个答案:

答案 0 :(得分:1)

使用str.extract + merge

df1
              Col
0  test1(arizona)
1         NJtest2

df2
  keyword              Label
0   test1   First Cycle Test
1   test2  Second Cycle Test
p = '(?P<Key>.*(?P<keyword>{}).*)'.format('|'.join(df2.keyword))

df1.Col.str.extract(p, expand=True)\
            .merge(df2).drop('keyword', 1)

              Key              Label
0  test1(arizona)   First Cycle Test
1         NJtest2  Second Cycle Test

正则表达式模式提取关键字以及完整字符串,这使得合并无痛。

答案 1 :(得分:1)

def sum?(array, n) Set.new( array.reject do |v| v >= n end.combination(2).map do |a, b| a + b end ).include?(n) end + fuzzywuzzy

apply

答案 2 :(得分:0)

我在这里有点困惑。您只想用原始字符串替换关键字列?那么这应该足够了:

df1 = pd.DataFrame({'original string': ['test1(arizona)', 'NJtest2']})
df2 = pd.DataFrame({'keyword': ['test1', 'test2','test3']
                    ,'label':['First Cycle Test','Second Cycle Test','Third Cycle Test']})

def func(x):
    find = [i for i in df1['original string'].tolist() if x in i]
    if find:
        return find[0]
    else:
        return "None"

df2.keyword = df2.keyword.apply(func)

df2 = df2.rename(columns=dict(keyword='Orginal',label='Target'))

返回:

          Orginal             Target
0  test1(arizona)   First Cycle Test
1         NJtest2  Second Cycle Test
2            None   Third Cycle Test