Question

我有一个正在使用的数据框，其中用点（“。”）指定缺少的值，并且我试图用“ Not_Given”替换丢失的数据。但是，其他一些列具有“。”在属于较长字符串一部分的字符串中。我设置了一个迷你数据框来测试以下替换方法：

test_df = pd.DataFrame({"a": ["1", "2", "3", "4", "5"], "b": ["1.0", "2.0", "3.0", "4.0", "5.0"], "c": ["a", "b", "c", ".", "a.b"]})
test_df

它将打印以下数据框：

我编写了以下代码来尝试替换单个“”。值（第3列的索引3）：

for col in ["a", "b", "c"]:
    test_df[col] = test_df[col].str.replace(".", "Not_Given")

test_df

这将返回输出：

显然，这将替换每个“”。它在数据帧中出现，因此值1.0被替换为1Not_Given0。

我还尝试了以下代码：

for col in ["a", "b", "c"]:
    test_df[col] = test_df[col].str.replace("\.{1,1}", "Not_Given")

，其输出仍与上述相同。

是否只有在有“”的情况下才有替换的方法。没有其他字符的值？

Answer 1

尝试使用熊猫replace功能：

test_df.replace({'.': 'Not_Given'})

结果：

   a    b          c
0  1  1.0          a
1  2  2.0          b
2  3  3.0          c
3  4  4.0  Not_Given
4  5  5.0        a.b

Answer 2

我想，也许是一个简单的表达式，例如，

^\s*\.\s*$

可以在这里正常工作。

我们还添加了\s*，以防万一，.之前或之后可能会有空格。

测试

import pandas as pd

test_df = pd.DataFrame({"a": ["1", "2", "3", "4", "5"], "b": [
                       "1.0", "2.0", "3.0", "4.0", "5.0"], "c": ["a", "b", "c", ".", "a.b"]})


test_df['c'] = test_df['c'].str.replace(r'^\s*\.\s*$', 'Not_Given')
print(test_df)

输出

   a    b          c
0  1  1.0          a
1  2  2.0          b
2  3  3.0          c
3  4  4.0  Not_Given
4  5  5.0        a.b

如果您想简化/更新/探索表达式，请在regex101.com的右上角进行解释。如果您有兴趣，可以观看匹配的步骤或在this debugger link中进行修改。调试器演示了a RegEx engine如何逐步使用一些示例输入字符串并执行匹配过程的过程。

Answer 3

您可以使用"^\.$"

test_df[col].str.replace("^\.$", "Not_Given")

或简单地

test_df[col][ test_df[col] == '.' ] = "Not_Given"

import pandas as pd

test_df = pd.DataFrame({"a": ["1", "2", "3", "4", "5"], "b": ["1.0", "2.0", "3.0", "4.0", "5.0"], "c": ["a", "b", "c", ".", "a.b"]})

for col in ["a", "b", "c"]:
    #test_df[col] = test_df[col].str.replace("^\.$", "Not_Given")
    test_df[col][ test_df[col] == '.' ] = "Not_Given"
print(test_df)

Answer 4

df[df['c'] == '.'] = 'Not_Given'

Answer 5

以下是几种不同的惯用解决方案：

import numpy as np
import pandas as pd

df[df.eq('.')] = np.NaN

df = df.map({'.': np.NaN})

df = df.replace(to_replace='.', value=np.NaN)

df = df.replace({'.': np.NaN})

仅在“。”时替换。是python中数据框列中的唯一值

5 个答案:

测试

输出