Question

我对熊猫还很陌生，我有一个包含约250,000行的数据集，存储在JSON中。我的其中一列在每个单元格中包含一个很长的字符串，可能是唯一的字符串，在数据可用之前，我必须对其进行过滤。由于某些原因，每个值都被正确地访问和过滤（意味着正确的值最后存储在我的处理变量中），但是当涉及到df.iloc [x] ['notes']的赋值时，这些值是没有正确地重新分配到数据框中。我已经读过有关熊猫中链式索引和分配的问题，但是我认为可以通过使用.iloc来解决，但现在对我来说不起作用。

这里是一个示例：

假设这是我的数据框和一些过滤代码：

import pandas as pd 

#Listing the things I want to filter out
greeting = ['Hello,', 'Hi']
goodbye = ['Thank you', 'Goodbye']

df = pd.DataFrame({'ID':[123, 456, 789], 'Group':['A', 'B', 'C'],\
'notes':['Hello, this is John', 'Thank you for your help',\
'This is a message.']})

#Doing the actual filtering
for x in range(0, len(df['notes'])):

    note = df.iloc[x]['notes']

    for y in greeting:
        if y in note:
            note = note.replace(y, '')

    for z in goodbye:
        if z in note:
            note = note.replace(z, '')

#The variable note is correctly filtered here,\
but then it doesn't assign and leaves the df unchanged\
at the previous index, so error is probably beyond this point

    df.iloc[x]['notes'] = note
df.to_json('final_data.json', orient = 'records')

我用来代替.iloc的另一件事是df.at [x，'notes'] = note，但这似乎有相同的问题。

所以在最终版本中，而不是得到类似的东西：

[{'ID'：1，'Group'：“ A”，'notes：'这是John'} .. etc。]

我得到：

[{'ID'：1，'Group'：“ A”，'notes'：'您好，我是John'} .. etc。] （完全不变）

这是怎么回事？是否有一些我无法解决的无法预测的任务？

Answer 1

为什么不呢？

df['notes'] = df['notes'].str.replace('|'.join(greeting + goodbye), '')

现在：

df.to_json('final_data.json', orient = 'records')

将为您提供理想的json文件。

为：

[{"Group":"A","ID":123,"notes":" this is John"},{"Group":"B","ID":456,"notes":" for your help"},{"Group":"C","ID":789,"notes":"This is a message."}]

Answer 2

使用下面的代码。

Var idx是数据帧df的索引，您可以将idx传递给.loc（）以进行索引。变量row是一个系列，在一行中包含数据。

for idx, row in df.iterrows():

    note = row['notes']

    for y in greeting:
        if y in note:
            note = note.replace(y, '')

    for z in goodbye:
        if z in note:
            note = note.replace(z, '')

    df.loc[idx, 'notes'] = note

df.iloc是否未在For循环中分配值？（熊猫）

2 个答案:

df.iloc是否未在For循环中分配值？ （熊猫）

2 个答案:

df.iloc是否未在For循环中分配值？（熊猫）