Question

此代码应该从excel文件中获取字符串值。该值是如何不被识别为字符串。如何将查询作为字符串？ str（字符串）似乎不起作用。

def main():
    file_location = "/Users/ronald/Desktop/Twitter/TwitterData.xlsx" 
    workbook = xlrd.open_workbook(file_location) #open work book
    worksheet = workbook.sheet_by_index(0)
    num_rows = worksheet.nrows - 1
    num_cells = worksheet.ncols - 1
    curr_row = 0
    curr_cell = 3
    count = 0
    string = 'tweet'
    tweets = []
    while curr_row < num_rows:
        curr_row += 1
        tweet = worksheet.cell_value(curr_row, curr_cell)
        tweet.encode('ascii', 'ignore')
        #print tweet
        query = str(tweet)
        if (isinstance(query, str)):
            print "it is a string"
        else:
            print "it is not a string"

这是我一直在犯的错误。

UnicodeEncodeError：＆＃39; ascii＆＃39;编解码器无法对字符进行编码 102-104：序数不在范围内（128）

Answer 1

Python中有两种不同的类型，它们都以不同的方式表示字符串。

str或bytes：这是Python 2中的默认值（因此为str），在Python 3中称为bytes。它表示字符串为字节序列，对于unicode来说效果不佳，因为每个字符不一定是ASCII中的一个字节和其他一些编码。
unicode或str：这是Python 3中的默认设置.Unicode使用重音和国际字符处理字符，因此特别是在处理类似Twitter的内容时，这是什么你要。在Python 2中，这也是导致某些字符串具有小u''前缀的原因。

你的＆＃34;这是一个字符串？＆＃34; test由isinstance(s, str)组成，它只测试第一种类型而忽略另一种类型。相反，您可以针对basestring - isinstance(s, basestring)进行测试，因为它是str和unicode的父级。这恰当地回答了＆＃34;这是一个字符串的问题？＆＃34;对于Python 2，这就是为什么你会得到误导性的结果。

请注意，如果您迁移到Python 3，则basestring不存在。这只是一个Python 2测试。

将价值视为一个字符串

1 个答案: