如何从列中获取特定值并在Python / Panda中添加为新列?

时间:2019-03-19 10:48:06

标签: python pandas

我有一个包含信息data1的数据框,想添加一列data2,其中仅包含data1的名称:

       data1                                         data2
0      info  name: Michael Jackson      New York     Michael Jackson
1      info 12 name: Michael Jordan III Los Angeles  Michael Jordan III 

您知道我该怎么做吗?

1 个答案:

答案 0 :(得分:0)

没有明确的定界符,这并非易事,因为名称中有两个空格,多个名称长度(2个单词,3个单词),并且尾随列也可能有多个单词带有空格。

拆分字符串可以实现部分解决方案:

df['data2'] = df['data1'].str.split(': ').str[-1]

>>> print(df)

                                          data1                           data2
0     info  name: Michael Jackson      New York   Michael Jackson      New York
1  info 12 name: Michael Jordan III Los Angeles  Michael Jordan III Los Angeles

如果您有“城市”列表,则可以完成完整的解决方案:

def replace(string, substitutions):
    """Replaces multiple substrings in a string."""
    substrings = sorted(substitutions, key=len, reverse=True)
    regex = re.compile('|'.join(map(re.escape, substrings)))
    return regex.sub(lambda match: substitutions[match.group(0)], string)

# List of cities to remove from strings
cities = ['New York', 'Los Angeles']
# Dictionary matching each city with the empty string
substitutions = {city:'' for city in cities}

# Splitting to create new column as above
df['data2'] = df['data1'].str.split(': ').str[-1]
# Applying replacements to new column
df['data2'] = df['data2'].map(lambda x: replace(x, substitutions).strip())

>>>print(df)

                                          data1               data2
0     info  name: Michael Jackson      New York     Michael Jackson
1  info 12 name: Michael Jordan III Los Angeles  Michael Jordan III

使用carlsmith替换功能。