替换所有符合条件的单元格的内容

时间:2018-12-31 03:56:56

标签: python pandas dataframe

当列名未知时,如何遍历整个数据帧以删除包含特定字符串的单元格中的数据?

这是我到目前为止的内容:

for (i in colnames(df)){
   df2 = df[~df[i].str.contains('found')]

我的数据:

  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
0           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
1             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
2         ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
3                                      NaN                                      NaN            ├───OK─── http://www.set.com/
4                                      NaN                                      NaN            ├───OK─── http://www.one.com/

如果单元格包含字符串(例如“ found”),如何删除单元格的全部内容?我想删除单元格中的所有内容(包括字符串前后)。

3 个答案:

答案 0 :(得分:3)

您可以在此处使用applymap

{{1}}

答案 1 :(得分:1)

当您正在寻找一个字符串/值来检查并针对整个DataFrame采取操作时,因此DataFrame.replace方法适合此处的警告。

示例数据框:

>>> df
      a
0  foo1
1  foo2
2   bar
3   bar
4   bar

bar替换为空白,或者如果需要,可以将其替换为NaN值:

>>> df.replace("bar", "", regex=True)
      a
0  foo1
1  foo2
2
3
4

或将bar替换为NaN

>>> df.replace("bar", np.nan, regex=True)
 # df.replace("bar", np.nan, regex=True, inplace=True)
      a
0  foo1
1  foo2
2   NaN
3   NaN
4   NaN

如果您想替换成实际数据框,则可以选择使用inplace=True

模拟给出的示例:

>>> df
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
3          ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

具有str。的结果包含:

>>> df[~df["col1"].str.contains("found")]
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

按陈述或的操作,请确保值是字符串类型以应用操作

>>> df[~df["col1"].astype(str).str.contains("found")]

答案 2 :(得分:0)

您可以使用df.replace({'test':np.nan})用nan替换,并且应该替换所有实例