Question

我需要删除pandas df列中的空格。我的数据如下：

industry            magazine
Home                "Goodhousekeeping.com"; "Prevention.com";
Fashion             "Cosmopolitan"; " Elle"; "Vogue"
Fashion             " Vogue"; "Elle"

下面是我的代码：

# split magazine column values, create a new column in df 
df['magazine_list'] = dfl['magazine'].str.split(';')

# stip the first whitespace from strings
df.magazine_list = df.magazine_list.str.lstrip()

这将返回所有NaN，我也尝试过：

df.magazine = df.magazine.str.lstrip()

这也没有删除空格。

Answer 1

将列表理解与带分隔符的条带一起使用，也请在分隔带条之前去除值以删除尾随;，空格和"值：

f = lambda x: [y.strip('" ') for y in x.strip(';" ').split(';')]
df['magazine_list'] = df['magazine'].apply(f)
print (df)
  industry                                 magazine  \
0     Home  Goodhousekeeping.com; "Prevention.com";   
1  Fashion           Cosmopolitan; " Elle"; "Vogue"   
2  Fashion                             Vogue; "Elle   

                            magazine_list  
0  [Goodhousekeeping.com, Prevention.com]  
1             [Cosmopolitan, Elle, Vogue]  
2                           [Vogue, Elle]

Answer 2

Jezrael提供了一个很好的解决方案。知道熊猫具有用于类似操作的字符串访问器而无需列表理解是很有用的。通常，列表理解速度更快，但是根据使用情况，使用pandas内置函数可能更易读或编码。

df['magazine'] = (
    df['magazine']
    .str.replace(' ', '', regex=False)
    .str.replace('"', '', regex=False)
    .str.strip(';')
    .str.split(';')
)

输出

  industry                                magazine
0     Home  [Goodhousekeeping.com, Prevention.com]
1  Fashion             [Cosmopolitan, Elle, Vogue]
2  Fashion                           [Vogue, Elle]

如何从熊猫列中的字符串中删除空格

2 个答案: