Question

我有一个DataFrame，我只想对字符串的特定部分使用下划线后缀将其大写。

|         TYPE       |  NAME  |
|-----------------------------|
| Contract Employee  | John   |
| Full Time Employee | Carol  |
| Temporary Employee | Kyle   |

我希望将“合同”和“临时”这两个字大写，并在其前后加上下划线：

|         TYPE         |  NAME  |
|-------------------------------|
| _CONTRACT_ Employee  | John   |
| Full Time Employee   | Carol  |
| _TEMPORARY_ Employee | Kyle   |

我尝试使用str.upper（），但这使整个单元格变为大写，我只在寻找那些特定的单词。

编辑：如果重要的话，我有时会提到单词不大写。通常，它会显示为temporary employee而不是Temporary Employee。

Answer 1

以下是使用re.sub的一种选择：

def type_to_upper(match):
    return match.group(1).upper()

text = "Contract Employee"
output = re.sub(r'\b(Contract|Temporary)\b', type_to_upper, text)

编辑：

这与在熊猫中使用的方法相同，还解决了有关不确定的要替换的大写或小写单词的最新编辑内容：

测试数据框：

                 TYPE   NAME
0   Contract Employee   John
1  Full Time Employee  Carol
2  Temporary Employee   Kyle
3   contract employee   John
4  Full Time employee  Carol
5  temporary employee   Kyle

解决方案：

def type_to_upper(match):
    return '_{}_'.format(match.group(1).upper())

df.TYPE = df.TYPE.str.replace(r'\b([Cc]ontract|[Tt]emporary)\b', type_to_upper)

结果：

df 
                   TYPE   NAME
0   _CONTRACT_ Employee   John
1    Full Time Employee  Carol
2  _TEMPORARY_ Employee   Kyle
3   _CONTRACT_ employee   John
4    Full Time employee  Carol
5  _TEMPORARY_ employee   Kyle

请注意，这仅用于精确解决OPs请求中定义的这两种情况。对于完全不区分大小写的情况，它甚至更简单：

df.TYPE = df.TYPE.str.replace(r'\b(contract|temporary)\b', type_to_upper, case=False)

Answer 2

修改数据框的内容（不包含正则表达式或其他任何内容）

l=['Contract','Temporary']
df['TYPE']=df['TYPE'].apply(lambda x: ' '.join(['_'+i.upper()+'_' if i in l else i for i in x.split()]))

join和split，位于apply中。

然后现在：

print(df)

是：

                   TYPE   NAME
0   _CONTRACT_ Employee   John
1    Full Time Employee  Carol
2  _TEMPORARY_ Employee   Kyle

Answer 3

通过将replace与字典格式一起使用，这是一种简便的方法。

请refer pandas Doc for Series.replace

df["TYPE"] = df["TYPE"].replace({'Contract': '_CONTRACT_', 'Temporary': '_Temporary_'}, regex=True)

仅转载：

>>> df
                 TYPE   Name
0   Contract Employee   John
1  Full Time Employee  Carol
2  Temporary Employee   Kyle

>>> df["TYPE"] = df["TYPE"].replace({'Contract': '_CONTRACT_', 'Temporary': '_TEMPORARY_'}, regex=True)
>>> df
                   TYPE   Name
0   _CONTRACT_ Employee   John
1    Full Time Employee  Carol
2  _TEMPORARY_ Employee   Kyle

Answer 4

U9在输入上使用lambda和split()击败了我：

def match_and_upper(match):
    matches = ["Contract", "Temporary"]
    if match in matches:
        return match.upper()
    return match

input = "Contract Employee"
output = " ".join(map(lambda x: match_and_upper(x), input.split()))
# Result: CONTRACT Employee #

Answer 5

在这里回答我自己的问题的一部分。使用他提供的@Tim Biegeleisen的正则表达式，我在列上进行了字符串替换。

df["TYPE"] = df["TYPE"].str.replace(r'\b(Contract)\b', '_CONTRACT_')

将字符串的特定部分转换为大写？

5 个答案: