我正在使用scrapy并将从网页获取的数据写入CSV文件
我的pipeline
代码:
def __init__(self):
self.file_name = csv.writer(open('example.csv', 'wb'))
self.file_name.writerow(['Title', 'Release Date','Director'])
def process_item(self, item, spider):
self.file_name.writerow([item['Title'].encode('utf-8'),
item['Release Date'].encode('utf-8'),
item['Director'].encode('utf-8'),
])
return item
我在CSV文件中的输出格式是:
Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....
但是有可能将title
及其值写入一列Release date
及其值到下一列Director
及其值到下一列(因为CSV是逗号分隔值)在CSV文件中,如下面的格式。
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
任何帮助将不胜感激。提前谢谢。
答案 0 :(得分:1)
TSV(制表符分隔值)可能会得到你想要的东西,但是当行的长度非常不同时,它常常变得难看。
您可以轻松编写一些代码来生成这样的表。诀窍是你需要在输出之前拥有所有行,以便计算列的宽度。
您可以在互联网上找到大量的代码段here is one I used before。
答案 1 :(得分:1)
更新 - 重新计算代码以便:
- 使用@madjar和
建议的生成器函数- 更贴近OP提供的代码段。
醇>
我正在尝试使用texttable
替代方案。它产生与问题中相同的输出。此输出可能会写入csv文件(记录将需要按摩适当的csv方言,我找不到仍然使用csv.writer
的方法,仍然可以获得每个字段中的填充空格。
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
以下是生成上述结果所需代码的草图:
from texttable import Texttable
# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function
def process_item(item):
# This massages each record in preparation for writing to csv
item['Title'] = item['Title'].encode('utf-8') + ','
item['Release Date'] = item['Release Date'].encode('utf-8') + ','
item['Director'] = item['Director'].encode('utf-8')
return item
def initialise_dataset():
data = [{'Title' : 'Title',
'Release Date' : 'Release Date',
'Director' : 'Director'
}, # first item holds the table header
{'Title' : 'And Now For Something Completely Different',
'Release Date' : '1971',
'Director' : 'Ian MacNaughton'
},
{'Title' : 'Monty Python And The Holy Grail',
'Release Date' : '1975',
'Director' : 'Terry Gilliam and Terry Jones'
},
{'Title' : "Monty Python's Life Of Brian",
'Release Date' : '1979',
'Director' : 'Terry Jones'
}
]
data = [ process_item(item) for item in data ]
return data
def records(data):
for item in data:
yield [item['Title'], item['Release Date'], item['Director'] ]
# this ends the data simulation part
# --------------------------------------------------------
def create_table(data):
# Create the table
table = Texttable(max_width=0)
table.set_deco(Texttable.HEADER)
table.set_cols_align(["l", "c", "c"])
table.add_rows( records(data) )
# split, remove the underlining below the header
# and pull together again. Many ways of cleaning this...
tt = table.draw().split('\n')
del tt[1] # remove the line under the header
tt = '\n'.join(tt)
return tt
if __name__ == '__main__':
data = initialise_dataset()
table = create_table(data)
print table