基于同一数据帧的另一列将缩写应用于数据帧的列

时间:2021-07-10 20:05:47

标签: python pandas nlp pandas-groupby text-classification

我在数据框中有两列,其中一列是类,另一列是描述。在描述中我有一些缩写。我想根据类值扩展这些缩写。我有一个以类为键的字典,在值中我有另一个带有缩写及其完整形式的字典。由于这些缩写的含义因类别而异。 例如:- IT 可能意味着基于类别标签的以太信息传输或信息技术。

我尝试了 groupby,但无法将其恢复到原始数据框中。 任何帮助深表感谢。 谢谢

这就是我尝试的方式:

grouped = df.groupby('class')
for n,j in grouped:
    j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))

example

2 个答案:

答案 0 :(得分:1)

输入数据:

abb = {'IT':{'SQL':'Structured Query Language', 'BLAH': 'blah blah'}, 'Sales':{'SQL':'Sales Qualified Lead'}}

data = [{'class':'IT', 'description':'SQL developer'},
        {'class':'IT', 'description':'SQL developer BLAH'},
        {'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)

   class                                    description
0     IT            Structured Query Language developer
1     IT  Structured Query Language developer blah blah
2  Sales                    senior Sales Qualified Lead

代码:

df['description'] = (df.groupby('class', as_index=False)
                     .apply(lambda x: x['description'].str.replace('|'.join(abb[x.name].keys()),
                                                                   lambda m: abb[x.name][m.group(0)]
                                                                  )
                           ).reset_index(drop=True)
                    )

输出:

   class                                    description
0     IT            Structured Query Language developer
1     IT  Structured Query Language developer blah blah
2  Sales                    senior Sales Qualified Lead

答案 1 :(得分:0)

这是一个工作示例,该示例将行作为输入并在字典中查找 class 值,并将字符串 description 替换为字典中的相应值:

import pandas as pd

abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}}

data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)

def replace_strings(row):
    text = row['description']
    for key, value in abb[row['class']].items():
        text = text.replace(key, value)
    return text

df['description'] = df.apply(replace_strings, axis=1)
<头>
class 描述
0 IT 结构化查询语言开发人员
1 销售 高级销售合格线索