Question

我需要让一些名称格式匹配，以便稍后在我的脚本中进行合并。我的专栏'名称'是从csv导入的，包含如下名称：

Antonio Brown

LeSean McCoy

Le'Veon Bell

对于我的剧本，我想获得名字的第一个字母，并将其与姓氏结合起来.......

A.Brown

L.McCoy

L.Bell

以下是我现在每次都返回NaaN的内容：

ff['AbbrName'] = ff['Name'].str.extract('([A-Z]\s[a-zA-Z]+)', expand=True)

谢谢！

Answer 1

str.replace使用^([A-Z]).*?([a-zA-Z]+)$方法的另一个选项; ^([A-Z])捕获字符串开头的第一个字母; ([a-zA-Z]+)$与最后一个字匹配，然后通过在第一个捕获的组和第二个捕获的组之间添加.来重新构建名称：

df['Name'].str.replace(r'^([A-Z]).*?([a-zA-Z]+)$', r'\1.\2')
#0    A.Brown
#1    L.McCoy
#2     L.Bell
#Name: Name, dtype: object

Answer 2

如果你只是apply()一个函数会被第一个空格分开并得到第一个单词的第一个字符，那么该怎么办：

import pandas as pd


def abbreviate(row):
    first_word, rest = row['Name'].split(" ", 1)
    return first_word[0] + ". " + rest


df = pd.DataFrame({'Name': ['Antonio Brown', 'LeSean McCoy', "Le'Veon Bell"]})
df['AbbrName'] = df.apply(abbreviate, axis=1)
print(df)

打印：

            Name  AbbrName
0  Antonio Brown  A. Brown
1   LeSean McCoy  L. McCoy
2   Le'Veon Bell   L. Bell

Answer 3

This should be simple enough to do, even without regex. Use a combination of string splitting and concatenation.

df.Name.str[0] + '.' + df.Name.str.split().str[-1]

0    A.Brown
1    L.McCoy
2     L.Bell
Name: Name, dtype: object

If there is a possibility of the Name column having leading spaces, replace df.Name.str[0] with df.Name.str.strip().str[0].

Caveat: Columns must have two names at the very least.

Answer 4

你得到了NaaN，因为你的正则表达式与名字不匹配。

相反，我会尝试以下方法：

parts = ff[name].split(' ')
ff['AbbrName'] = parts[0][0] + '.' + parts[1]

如何从Python中的字符串中提取字符？

4 个答案: