我使用网页抓取脚本(在GitHub上找到),并将结果写入.csv文件。一些结果(用户评论)是用日语或俄语写的,因此我想将unicode写入.csv文件。
当我使用csv模块时,代码工作正常,但是这并没有将unicode写入csv。
这是我用于网页抓取的代码的一部分:
with open(datafile, 'w', newline='', encoding='utf8') as csvfile:
# Tab delimited to allow for special characters
datawriter = csv.writer(csvfile, delimiter=',')
print('Processing..')
for i in range(1,pages+1):
# Sleep if throttle enabled
if(throttle): time.sleep(sleepTime)
page = requests.get(reviewPage + '&page=' + str(i))
tree = html.fromstring(page.content)
# Each item below scrapes a pages review titles, bodies, ratings and languages.
titles = tree.xpath('//a[@class="review-title-link"]')
bodies = tree.xpath('//div[@class="review-body"]')
ratings = tree.xpath('//div[@data-status]')
langs = tree.xpath("//h3[starts-with(@class, 'review-title')]")
dates = tree.xpath("//time[@datetime]")
for idx,e in enumerate(bodies):
# Title of comment
title = titles[idx].text_content()
# Body of comment
body = e.text_content().strip()
# The rating is the 5th from last element
rating = ratings[idx].get('data-status').split(' ')[-5]
# Language is 2nd element of h3 tag
lang = langs[idx].get('class').split(' ')[1]
#Date
date = dates[idx].get("datetime").split('T')[0]
datawriter.writerow([title,body,rating,lang,date])
print('Processed ' + str(ratingCount) + '/' + str(ratingCount) + ' ratings.. Finished!')
我试过import unicodecsv as csv
,但这引发了一个TypeError:
TypeError Traceback (most recent call last)
<ipython-input-4-2db937260285> in <module>()
44 date = dates[idx].get("datetime").split('T')[0]
45
---> 46 datawriter.writerow([title,body,rating,lang,date])
47 print('Processed ' + str(ratingCount) + '/' + str(ratingCount) + ' ratings.. Finished!')
~\lib\site-packages\unicodecsv\py3.py in writerow(self, row)
26
27 def writerow(self, row):
---> 28 return self.writer.writerow(row)
29
30 def writerows(self, rows):
C:\Users\Ebel\Anaconda3\lib\site-packages\unicodecsv\py3.py in write(self, string)
13
14 def write(self, string):
---> 15 return self.binary.write(string.encode(self.encoding, self.errors))
16
17
TypeError: write() argument must be str, not bytes
我想解决这个问题。提前致谢!
答案 0 :(得分:0)
由于<DataTemplate>
<TextBlock HorizontalAlignment="Right" VerticalAlignment="Center">
<TextBlock.Text>
<Binding Path="Value" StringFormat="###,###,###,###,###,###,##0.000;(###,###,###,###,###,###,##0.000)" />
</TextBlock.Text>
<TextBlock.Style>
<Style TargetType="{x:Type TextBlock}">
<Setter Property="Foreground">
<Setter.Value>
<Binding Path="Value" Converter="{StaticResource negativeToBrushConvertor}" />
</Setter.Value>
</Setter>
<Style.Triggers>
<DataTrigger Binding="{Binding Path=Background.Color,RelativeSource={RelativeSource AncestorType=dxg:GridCellContentPresenter}}" Value="#FF0000">
<Setter Property="Foreground" Value="White" />
</DataTrigger>
</Style.Triggers>
</Style>
</TextBlock.Style>
</TextBlock>
</DataTemplate>
正在写字节而不是字符串,因此您希望在unicodecsv
中open()
处理您的文件。请注意,binary mode
不需要编码,您必须删除binary mode
参数。
encoding
然后成为:
with open(datafile, 'w', newline='', encoding='utf8') as csvfile:
with open(datafile, 'wb', newline='') as csvfile:
中的b
表示您想要写字节而不是字符串。
答案 1 :(得分:0)
发表评论答案。
您的with
对于Python 3是正确的,并且只有Python 2才需要unicodecsv
。只需import csv
(使用内置的)。在Windows上,使用encoding='utf-8-sig'
。如果没有BOM签名,Windows记事本将无法正确显示UTF-8文件,Excel也不会正确读取它。