Question

我有一个数据框，看起来像这样：

print(df)
 Text     
 0|This is a text 
 1|This is also text

我希望的内容：我想在数据框的“文本”列上进行for循环，然后使用派生信息创建一个新列，如下所示：

   Text             | Derived_text 
 0|This is a text   | Something
 1|This is also text| Something

代码：我编写了以下代码（我使用Spacy btw）：

for i in df['Text'].tolist():
    doc = nlp(i)
    resolved = [(doc._.coref_resolved) for docs in doc.ents]
    df = df.append(pd.Series(resolved), ignore_index=True)

问题：问题是附加的系列被放错了位置/不匹配，因此看起来像这样：

  Text              | Derived_text 
 0|This is a text   | NaN
 1|This is also text| NaN
 2|NaN              | Something
 3|NaN              | Something

我还尝试过将其保存到列表中，但是该列表不包含NaN值，在执行派生的for循环时可能会发生这种情况。我需要保留NaN值，以便可以使用索引位置将原始文本与派生文本进行匹配。

Answer 1

您似乎想添加一列，可以使用pandas concat method和axis这样的pd.concat([df, new_columns], axis = 1)参数来完成。

但是，我认为您在使用熊猫时不应使用for循环。可能应该做的是使用pandas's apply function，它看起来像：

# define you DataFrame
df = pd.DataFrame(data = [range(6), range(1, 7)], columns = ['a', 'b'])

# create the new column from one of them
df['a_squared'] = df['a'].apply(lambda x: x ** 2)

也许您也应该研究lambda expressions。

另外，请查看此stackoverflow question。

希望这对您有所帮助！编码愉快！

将数据附加到熊猫进行循环

1 个答案: