我有两种文件,excel和csv,我用它来读取带有两个永久列的数据:问题,答案和两个临时列,可能存在也可能不存在Word和替换。
我已经使用不同的函数来读取csv和excel文件中的数据,这些文件将根据文件的扩展名进行调用。
是否有办法从临时列(Word和替换)中读取数据,具体取决于它们何时存在以及何时不存在。请参阅下面的功能定义:
1)对于CSV文件:
def read_csv_file(path):
quesData = []
ansData = []
asciiIgnoreQues = []
qWithoutPunctuation = []
colnames = ['Question','Answer']
data = pandas.read_csv(path, names = colnames)
quesData = data.Question.tolist()
ansData = data.Answer.tolist()
qWithoutPunctuation = quesData
qWithoutPunctuation = [''.join(c for c in s if c not in string.punctuation) for s in qWithoutPunctuation]
for x in qWithoutPunctuation:
asciiIgnoreQues.append(x.encode('ascii','ignore'))
return asciiIgnoreQues, ansData, quesData
2)读取Excel数据的功能:
def read_excel_file(path):
book = open_workbook(path)
sheet = book.sheet_by_index(0)
quesData = []
ansData = []
asciiIgnoreQues = []
qWithoutPunctuation = []
for row in range(1, sheet.nrows):
quesData.append(sheet.cell(row,0).value)
ansData.append(sheet.cell(row,1).value)
qWithoutPunctuation = quesData
qWithoutPunctuation = [''.join(c for c in s if c not in string.punctuation) for s in qWithoutPunctuation]
for x in qWithoutPunctuation:
asciiIgnoreQues.append(x.encode('ascii','ignore'))
return asciiIgnoreQues, ansData, quesData
答案 0 :(得分:0)
我并不完全确定您尝试实现的目标,但是pandas
方式读取和转换数据的方法如下:
def read_file(path, typ):
if typ == "excel":
df = pd.read_excel(path, sheetname=0) # Default is zero
else: # Assuming "csv". You can make it explicit
df = pd.read_csv(path)
qWithoutPunctuation = df["Question"].apply(lambda s: ''.join(c for c in s if c not in string.punctuation))
df["asciiIgnoreQues"] = qWithoutPunctuation.apply(lambda x: x.encode('ascii','ignore'))
return df
# Call it like this:
read_data("file1.csv","csv")
read_data("file2.xls","excel")
read_data("file2.xlsx","excel")
如果您的数据不包含DataFrame
和["Question","Answer", "asciiIgnoreQues"]
,则会返回Word
列Replacement
,["Question", "Word", "Replacemen", "Answer", "asciiIgnoreQues"]
如果有的话
请注意,我使用了apply
,这使您可以在所有系列上以元素方式运行函数。