Question

我加入了两个数据框：一个包含年度日期，另一个使用日期范围的月度日期创建。
加入两个数据框后，有一些重复的日期值，我为其分配了后缀'_dup'。
现在，如何删除包含“ _dup”值的行。我的数据框如下：

enter image description here

现在，我使用以下代码删除/删除包含“ _dup”的日期行

for i in range (117):
if df5.iloc[i,0].str.contains ('_dup'):
    del df5.loc[i,0]

我收到错误消息：

AttributeError                            Traceback (most recent call last)
<ipython-input-171-ae80d413249e> in <module>()
      1 for i in range (117):
----> 2     if df5.iloc[i,0].str.contains ('_dup'):
      3         del df5.loc[i,0]

AttributeError: 'str' object has no attribute 'str'

我也尝试了以下代码：

df5[~df5.index.str.contains("_dup")]

出现以下错误：

AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')`

Answer 1

您的问题是df5.iloc[i,0]访问列中的单个str数据点，因此您无法再次对其应用str函数。您可以像这样将str.contains函数立即应用于整个列：

df = df.loc[~df["col_name"].str.contains("dup")]

但是，如果列中包含混合数据类型，则str.contains函数将不起作用。在这种情况下，您需要先转换类型（df["col_name"] = df["col_name"].astype(str)）。或者，如果重复的值是唯一具有字符串类型的数据点，则可以根据如下类型进行过滤：

df.loc[~df["col_name"].apply(lambda x: isinstance(x, str))]

Python从日期列删除包含字符串

1 个答案: