Question

在python / pandas中清理multitype数据框的值，我想修剪字符串。我目前正在执行两条指令：

import pandas as pd

df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])

df.replace('^\s+', '', regex=True, inplace=True) #front
df.replace('\s+$', '', regex=True, inplace=True) #end

df.values

这很慢，我能改进什么？

Answer 1

您可以使用DataFrame.select_dtypes选择string列，然后使用apply功能str.strip。

注意：值不能是types dicts或lists，因为他们的dtypes是object。

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

但如果只有几列使用str.strip：

df[0] = df[0].str.strip()

Answer 2

Money Shot

这是一个使用applymap的紧凑版本，只有在值为字符串类型时才使用简单的lambda表达式来调用strip：

df.applymap(lambda x: x.strip() if type(x) is str else x)

完整示例

更完整的例子：

import pandas as pd


def trim_all_columns(df):
    """
    Trim whitespace from ends of each value across all series in dataframe
    """
    trim_strings = lambda x: x.strip() if type(x) is str else x
    return df.applymap(trim_strings)


# simple example of trimming whitespace from data elements
df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
df = trim_all_columns(df)
print(df)


>>>
   0   1
0  a  10
1  c   5

工作示例

这是一个由饰品托管的工作示例： https://trinket.io/python3/e720bdf701

Answer 3

如果你真的想使用正则表达式，那么

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
   0   1
0  a  10
1  c   5

但这样做应该更快：

>>> df[0] = df[0].str.strip()

Answer 4

您可以尝试：

df[0] = df[0].str.strip()

或更具体地适用于所有字符串列

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

Answer 5

您可以使用Series对象的apply function：

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df[0][0]
'  a  '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'

请注意strip的使用情况，而不是regex的速度要快得多

另一个选项 - 使用DataFrame对象的apply function：

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)

   0   1
0  a  10
1  c   5

Answer 6

def trim(x):
    if x.dtype == object:
        x = x.str.split(' ').str[0]
    return(x)

df = df.apply(trim)

Answer 7

怎么样（对于字符串列）

df[col] = df[col].str.replace(" ","")

永不失败

Answer 8

# First inspect the dtypes of the dataframe
df.dtypes

# Then strip white spaces
df.apply(lambda x: x.str.strip() if isinstance(x, object) else x)

剥离/修剪数据帧的所有字符串

8 个答案:

Money Shot

完整示例

工作示例