Question

我在将列转换为小写时遇到了麻烦。它并不像使用：

那么简单

df['my_col'] = df['my_col'].str.lower()

因为我正在迭代很多数据帧，其中一些（但不是全部）在感兴趣的列中都有字符串和整数。如果像上面那样应用，这会导致较低的函数抛出异常：

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

我不想强迫类型是字符串，而是评估值是否为字符串然后 - 如果是 - 将它们转换为小写，并且 - 如果它们不是字符串 - 将它们保持原样。我认为这样可行：

df = df.apply(lambda x: x.lower() if(isinstance(x, str)) else x)

但它不起作用......可能是因为我忽略了一些明显的东西，但我看不出它是什么！

我的数据看起来像这样：

                          OS    Count
0          Microsoft Windows     3
1                   Mac OS X     4
2                      Linux     234
3    Don't have a preference     0
4  I prefer Windows and Unix     3
5                       Unix     2
6                        VMS     1
7         DOS or ZX Spectrum     2

Answer 1

你的lambda函数中的测试并不是很正确，但你并不是真的：

df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)

使用数据框和输出：

df = pd.DataFrame(columns = ['OS','Count'])
df.OS = ["Microsoft Windows","Mac OS X","Linux","Don't have a preference",\
      "I prefer Windows and Unix","Unix","VMS","DOS or ZX Spectrum"]
df.Count = [3,4,234,0,3,2,1,2]
df = df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)
df

    OS                          Count
0   microsoft windows           3
1   mac os x                    4
2   linux                       234
3   don't have a preference     0
4   i prefer windows and unix   3
5   unix                        2
6   vms                         1
7   dos or zx spectrum          2

Answer 2

这些列的类型是什么？ object？如果是这样，你应该转换它们：

df['my_col'] = df.my_col.astype(str).str.lower()

MVCE：

In [1120]: df
Out[1120]: 
   Col1
0   VIM
1   Foo
2  test
3     1
4     2
5     3
6   4.5
7   OSX

In [1121]: df.astype(str).Col1.str.lower()
Out[1121]: 
0     vim
1     foo
2    test
3       1
4       2
5       3
6     4.5
7     osx
Name: Col1, dtype: object

In [1118]: df.astype(str).Col1.str.lower().dtype
Out[1118]: dtype('O')

如果你想对这些行进行算术运算，你可能不应该混合str和数字类型。

但是，如果确实如此，您可以使用pd.to_numeric(..., errors='coerce')对数字进行类型转换：

In [1123]: pd.to_numeric(df.Col1, errors='coerce')
Out[1123]: 
0    NaN
1    NaN
2    NaN
3    1.0
4    2.0
5    3.0
6    4.5
7    NaN
Name: Col1, dtype: float64

您可以使用NaN，但现在请注意dtype。

Answer 3

从以上两个答案我认为这样做更安全：

请注意instanceof

astype(str)

因为如果你的字符串列只包含某些行中的数字，那么df_lower=df.apply(lambda x: x.astype(str).str.lower() if(x.dtype == 'object') else x)不会将它们转换为nan。这可能有点慢，但它不会将只有数字的行转换为nan。

Answer 4

这也可以工作并且可读性强：

for column in df.select_dtypes("object").columns:
    df[column] = df[column].str.lower()

可能的缺点可能是for遍历了列的子集。

仅当列值为字符串时才将列值转换为小写

4 个答案: