Question

当列名未知时，如何遍历整个数据帧以删除包含特定字符串的单元格中的数据？

这是我到目前为止的内容：

for (i in colnames(df)){
   df2 = df[~df[i].str.contains('found')]

我的数据：

  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
0           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
1             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
2         ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
3                                      NaN                                      NaN            ├───OK─── http://www.set.com/
4                                      NaN                                      NaN            ├───OK─── http://www.one.com/

如果单元格包含字符串（例如“ found”），如何删除单元格的全部内容？我想删除单元格中的所有内容（包括字符串前后）。

Answer 1

您可以在此处使用applymap：

{{1}}

Answer 2

当您正在寻找一个字符串/值来检查并针对整个DataFrame采取操作时，因此DataFrame.replace方法适合此处的警告。

示例数据框：

>>> df
      a
0  foo1
1  foo2
2   bar
3   bar
4   bar

将bar替换为空白，或者如果需要，可以将其替换为NaN值：

>>> df.replace("bar", "", regex=True)
      a
0  foo1
1  foo2
2
3
4

或将bar替换为NaN

>>> df.replace("bar", np.nan, regex=True)
 # df.replace("bar", np.nan, regex=True, inplace=True)
      a
0  foo1
1  foo2
2   NaN
3   NaN
4   NaN

如果您想替换成实际数据框，则可以选择使用inplace=True：

模拟给出的示例：

>>> df
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
3          ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

具有str。的结果包含：

>>> df[~df["col1"].str.contains("found")]
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

按陈述或的操作，请确保值是字符串类型以应用操作

>>> df[~df["col1"].astype(str).str.contains("found")]

Answer 3

您可以使用df.replace（{'test'：np.nan}）用nan替换，并且应该替换所有实例

替换所有符合条件的单元格的内容

3 个答案: