Question

我正在从文本文件中读取值，并尝试找到如下所示的子字符串的索引

df=pd.read_csv('break_sent.txt', index_col=False,encoding='utf-8',delimiter="\n",names=['sent'])
#print(df[:50])
#df.index = list(df.index)
df1= df[40:50]
print(len(df))
print(df1.index)
print("-------------------------------------------")
for i,row in df1.iterrows():
    string = row['sent']
    #print("string",string)
    d = df1[df1.sent.str.match(string)] # if the result includes more than 1 value then we know that substring and its matching parent string are present, then I will eliminate the substring from the dataframe
    if len(d.index > 2):
        index_val = df.index(string)
        df.drop(df.index(string),inpace=True)
        df.reset_index(level=None, drop=True, inplace=True)

运行此代码时，出现以下错误

Traceback (most recent call last):
  File "process.py", line 15, in <module>
    index_val = df.index(string)
    TypeError: 'RangeIndex' object is not callable

我试图将范围索引转换为List

df.index = list(df.index)

但是我得到了Int64Index是不可调用的。如何获取字符串的索引？

Answer 1

尝试更改

df.drop(df.index(string),inpace=True)

到

df.drop(index=string, inplace=True)

Answer 2

您需要在数据框上运行df.index，而不是在搜索字符串上运行。所以：

matched_rows = df.index[df1.sent.str.match(string)]

将为您提供与字符串匹配的行。然后，您应该能够将该输出传递到df.drop：

if len(matched_rows) > 2:
  df.drop(matched_rows, inplace=True)
  df.reset_index(level=None, drop=True, inplace=True)

我可能没有掌握您要执行的操作的确切细节，但希望可以为您指明正确的方向。

RangeIndex对象不可调用

2 个答案: