当列名未知时,如何遍历整个数据帧以删除包含特定字符串的单元格中的数据?
这是我到目前为止的内容:
for (i in colnames(df)){
df2 = df[~df[i].str.contains('found')]
我的数据:
Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
0 ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/
1 ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/
2 ├─BROKEN─ http://www.broken.com/ 2 links found. 0 excluded. 0 broken. ├─BROKEN─ http://www.broken.com/
3 NaN NaN ├───OK─── http://www.set.com/
4 NaN NaN ├───OK─── http://www.one.com/
如果单元格包含字符串(例如“ found”),如何删除单元格的全部内容?我想删除单元格中的所有内容(包括字符串前后)。
答案 0 :(得分:3)
您可以在此处使用applymap
:
{{1}}
答案 1 :(得分:1)
当您正在寻找一个字符串/值来检查并针对整个DataFrame采取操作时,因此DataFrame.replace
方法适合此处的警告。
示例数据框:
>>> df
a
0 foo1
1 foo2
2 bar
3 bar
4 bar
将bar
替换为空白,或者如果需要,可以将其替换为NaN
值:
>>> df.replace("bar", "", regex=True)
a
0 foo1
1 foo2
2
3
4
或将bar
替换为NaN
>>> df.replace("bar", np.nan, regex=True)
# df.replace("bar", np.nan, regex=True, inplace=True)
a
0 foo1
1 foo2
2 NaN
3 NaN
4 NaN
如果您想替换成实际数据框,则可以选择使用inplace=True
:
模拟给出的示例:
>>> df
col1
0 Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1 ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/
2 ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/
3 ├─BROKEN─ http://www.broken.com/ 2 links found. 0 excluded. 0 broken. ├─BROKEN─ http://www.broken.com/
4 NaN NaN ├───OK─── http://www.set.com/
5 NaN NaN ├───OK─── http://www.one.com/
具有str。的结果包含:
>>> df[~df["col1"].str.contains("found")]
col1
0 Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1 ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/ ├───OK─── http://www.this.com/
2 ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/ ├───OK─── http://www.is.com/
4 NaN NaN ├───OK─── http://www.set.com/
5 NaN NaN ├───OK─── http://www.one.com/
按陈述或的操作,请确保值是字符串类型以应用操作
>>> df[~df["col1"].astype(str).str.contains("found")]
答案 2 :(得分:0)
您可以使用df.replace({'test':np.nan})用nan替换,并且应该替换所有实例