Question

我有一个很大的数据文件，其中某些列中包含URL链接。例如：“照片”列具有值作为链接： https://cdn.sofifa.org/players/4/19/193080.png

类似地，其他列也具有此类链接。我想找到所有包含链接的列并将其删除。

我尝试使用此代码：

fb.str.contains('https') # fb = My DataFrame

但显示以下错误：

'DataFrame' object has no attribute 'str'

Answer 1

.str是pd.Series而不是pd.DataFrame的一部分。您可以使用.apply来检查每一列，然后过滤掉其中没有https的列：

In [91]: df
Out[91]:
   a      b  c
0  1  https  4
1  2         5
2  3         6

In [92]: df.loc[:, df.apply(lambda x: ~x.astype(str).str.contains('https').any())]
Out[92]:
   a  c
0  1  4
1  2  5
2  3  6

Answer 2

.str需要用于包含字符串而不是整个数据帧的特定数据帧列或系列。示例：

df = pd.DataFrame({
    'site': ['https//www.google.com', 'https://www.facebook.com', 'Reddit', 'Youtube'],
    'monthly_visitors': np.random.randint(1000, 10000, 4)
})

# the '~' symbol means to get everything that DOESN'T meet the given condition
df.loc[~df['site'].str.contains('https'), :]
# line above should exclude site names that contains 'https'

查找和删除包含特定字符串的列

2 个答案: