Question

感谢大家，我从stackoverflow社区学到了很多东西。我无法在任何地方找到这个问题的答案，所以我很感激你的帮助。

我创建了一个小型数据库（899 * 10）的科学论文。我想编写一个脚本来打印每个标题和摘要，然后做出（人类的，非自动化的）决定，包括或不包括在系统评价中。

我已接近以下脚本，它允许我更新每篇论文的“决定”列，并保存并退出，所以我不必一次完成所有操作。

import pandas as pd
import numpy as np

print "Please type the path of the database you would like to assess"
path = raw_input('>>> ')

data = pd.read_csv(path)

if 'jim_decision' not in data:
        data['jim_decision'] = pd.Series(np.nan)


def decision_maker(dataframe):
        current_row = 0
        while True:
                if pd.isnull(dataframe['jim_decision'][current_row]):
                        print "\n\n Title:\n\n %s \n Abstract:\n s\n\n\n" % (dataframe.Title[current_row], dataframe.Abstract[current_row])
                        decision = raw_input("From the title and abstract, should this article be included for review of full manuscript?\n\nType 'Y' or 'N', or 'Save' to exit: ")     

                        if decision == 'Save':  
                                dataframe.to_csv(path)
                                print "Your changes have been saved"
                                break
                        else: 
                                dataframe['jim_decision'][current_row] = decision
                                current_row += 1
decision_maker(data)

但是，出于某种原因，每次运行它时，我都会得到一个名为“Unnamed：[X]”的额外列，只包含索引号，在第一个pandas列之前添加。我无法解决它的来源，如何摆脱它，或者（我认为）是否存在污染数据的风险。

我对这一切都很陌生，所以我确定这不是很漂亮或pythonic，但我只是想学习使用python / pandas让我的研究生活更轻松......任何输入我将非常感激！

循环通过Pandas数据帧时如何避免生成不需要的空白列？

0 个答案: