Question

我正在使用XLRD尝试读取和操作封装在我的Excel文档单元格中的字符串文本。我发布了我的代码，以及当我选择打印某个列时返回的文本。

import xlrd
data = xlrd.open_workbook('data.xls')
sheetname = data.sheet_names()
employees = data.sheet_by_index(0)

print employees.col(2)

>>>[text:u'employee_first', text:u'\u201cRichard\u201d', text:u'\u201cCatesby\u201d', text:u'\u201cBrian\u201d']

我的目的是创建一个dict或者使用python中的字符串引用excel文档。我想在我的程序中有一些函数在本地操作数据，然后在稍后的时间点（不在这个问题的范围内）输出到第二个excel文件。

如何摆脱这些额外信息？

Answer 1

employees.col(2)是xlrd.sheet.Cell个实例的列表。要从列中获取所有值（而不是Cell个对象），可以使用col_values方法：

values = employees.col_values(2)

您也可以这样做（我的原始建议）：

values = [c.value for c in employees.col(2)]

但效率低于使用col_values。

\u201c和\u201d分别是unicode左右双引号。如果你想摆脱这些，你可以使用lstrip和rstrip字符串方法。例如。像这样的东西：

values = [c.value.lstrip(u'\u201c').rstrip(u'\u201d') for c in employees.col(2)]

Answer 2

如果您只对细胞的值感兴趣，那么您应该这样做：

values = sheet.col_values(colx=2)

而不是：

cells = sheet.col(colx=2)
values = [c.value for c in cells]

因为它更简洁，更有效（Cell对象是在/当被请求时动态构建的。）

xlrd打印的col输出似乎是xf格式化文本。我怎么摆脱这个？

2 个答案: