我想要的输出:
|John Snow | Male | 36 | New York|
我得到的是什么:
|('John Snow','Male','36','New York')|
这是我的代码:
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.yellowpages.com/search?search_terms=coffe&geo_location_terms=Los+Angeles%2C+CA"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("a")
g_data = soup.find_all("div",{"class":"info"})
list_info = []
# code hidden here
y = business_name,addressRegion,addressLocality,postalCode
list_info.append(y)
print list_info
resultFile = open("output.csv",'wb')
writer = csv.writer(resultFile)
for item in list_info:
writer.writerow([item]
我该如何解决这个问题?
答案 0 :(得分:0)
您对y
的定义会产生 tuple 。
然后,将y
追加到list_info
,创建一个元组列表。
然后,当您想将结果写入文件时,您将迭代列表的元素。在for
圈内,item
是元组。
在您编写该元组之前(将其提供给csv.writer
对象),您将其置于单元素列表中。因为csv.writer
不会递归地从输入数据结构中提取元素,所以它会将单字段行写入文件。该字段是元组隐式转换为字符串。以下是显示该问题的简化示例:
business_name = 'cokolwiek'
addressRegion = 'CA'
addressLocality = 'San Francisco'
postalCode = '12345'
y = business_name,addressRegion,addressLocality,postalCode
print(y)
# ('cokolwiek', 'CA', 'San Francisco', '12345')
print([y])
# [('cokolwiek', 'CA', 'San Francisco', '12345')]
你要做的就是在item
附近放置方括号 - csv.writer.writerow
能够很好地处理元组,所以你可以直接编写它们。或者,您可以使用list
函数将元组转换为列表。
business_name = 'cokolwiek'
addressRegion = 'CA'
addressLocality = 'San Francisco'
postalCode = '12345'
y = business_name,addressRegion,addressLocality,postalCode
list_info = []
list_info.append(y)
list_info.append(y)
resultFile = open("output.csv",'wb')
writer = csv.writer(resultFile, delimiter="|")
for item in list_info:
writer.writerow(item)
for item in list_info:
writer.writerow([item])
for item in list_info:
writer.writerow(list(item))
resultFile.close()
$ cat output.csv
cokolwiek|CA|San Francisco|12345
cokolwiek|CA|San Francisco|12345
('cokolwiek', 'CA', 'San Francisco', '12345')
('cokolwiek', 'CA', 'San Francisco', '12345')
cokolwiek|CA|San Francisco|12345
cokolwiek|CA|San Francisco|12345
答案 1 :(得分:0)
在这个TXR解决方案中,我们定义从头开始,用于表示HTML文档对象模型(DOM)的数据结构,HTML解析器以及用于查询信息的一些函数标签,类等的DOM。这些用于生成逗号分隔的企业及其地址列表。除了用于将HTML转义过滤到文本的内置功能之外,不使用特定于域的库。此外,CSV字段也会被转义:如果它们包含逗号或双精度,则会用引号括起来,双引号本身也会加倍。
首先,在使用curl
:
$ txr yp.txr data Drnk Coffe + Tea,West Hollywood,CA,90069 International Coffe and Tea,Torrance,CA,90501 Birdcage Coffe House,Long Beach,CA,90802 International Coffe and Tea,Woodland Hills,CA,91367 International Coffe and Tea,Valencia,CA,91355 Toast Bakery Cafe,Los Angeles,CA,90048 [ .. snip ... ] Chavez Cafe,Los Angeles,CA,90021 Standard Coffee Service Company,N/A,N/A,N/A
yp.txr
的内容:
@;; @;; Mini HTML parser @;; @;; Lisp object model @(do (defstruct doc-node nil) (defstruct text-node doc-node (content "")) (defstruct tag-node doc-node tag (attrs nil) (children nil))) @;; Parse tag attribs @(define attribs (list))@\ @(local attr val)@\ @(coll :gap 0 :vars (attr val))@\ @(cases) @{attr /[^\s'"=\/>]+/}@/\s*=\s*/"@val"@\ @(or) @{attr /[^\s'"=\/>]+/}@/\s*=\s*/'@val'@\ @(or) @{attr /[^\s'"=\/>]+/}@(bind val "")@\ @(end)@\ @(until)@/>|\/>/@\ @(end)@\ @(bind list @(mapcar (op cons (intern (trim-str @1)) (html-decode @2)) attr val))@\ @(end) @;; Parse HTML: output is a Lisp nested list in which elements @;; look like (symbol attribs nested-content ...) @;; Attribs look like ((sym1 "value") (sym2 "value") ...) @(define html (html))@\ @(local nest attr tag text)@\ @(cases)@\ <!--@(skip)>@\ @(or)@\ <@{tag /[\w\d]+/}@(attribs attr)>@\ @(coll :vars (nest))@\ @(cases)@(html nest)@\ @(or)@{text /[^<]+/}@\ @(bind nest @(new text-node content (html-decode text)))@\ @(end)@\ @(last)</@tag>@\ @(end)@\ @(bind html @(new tag-node tag (intern tag) attrs attr children nest))@\ @(or)@\ <@{tag /[\w\d]+/}@(attribs attr)/>@\ @(bind html @(new tag-node tag (intern tag) attrs attr))@\ @(end)@\ @(end) @;; @;; Helpful lisp functions for querying DOM @;; @(do ;; Collect document nodes satisfying function (defun get-by (html selector-fun) (typecase html (tag-node (let ((others (get-by html.children selector-fun))) (if [selector-fun html] (cons html others) others))) (list [mappend (op get-by @1 selector-fun) html]))) ;; Collect nodes from list satisfying function (defun filter-by (html selector-fun) (keep-if [orf (notf (op typep @1 'tag-node)) selector-fun] html)) ;; tag selector fun (defun select-tag (tag) (op eq @1.tag tag)) ;; class selector (defun select-class (class-sym class) (lambda (html) (iflet ((class-attr (cdr (assql class-sym html.attrs)))) (let ((classes (split-str class-attr " "))) (memqual class classes))))) (defun get-by-tag (html tag) (get-by html (select-tag tag))) (defun filter-by-tag (html tag) (filter-by html (select-tag tag))) (defun get-by-class (html class) (get-by html (select-class 'class class))) (defun get-by-class-like-attrib (html sym val) (get-by html (select-class sym val))) (defun filter-by-class (html class) (filter-by html (select-class 'class class))) (defun filter-by-class-like-attrib (html sym val) (filter-by html (select-class sym val)))) @(skip) // – End comScore Tag @(freeform "") @(coll)@(html doc)@(end) @(bind divs @(filter-by-class (get-by-tag doc 'div) "info")) @(do (defun get-content (div sym class) (let* ((a (get-by-class-like-attrib div sym class))) (if (and a [a 0].children) [[a 0].children 0].content "N/A"))) (defun csv-escape (str) (if (match-regex str #/.*[",]/) `"@(regsub #/"/ "\"" str)"` str)) (each ((div divs)) (let ((bn (get-content div 'class "business-name")) (al (get-content div 'itemprop "addressLocality")) (ar (get-content div 'itemprop "addressRegion")) (pc (get-content div 'itemprop "postalCode"))) (set al (regsub #/,\s*/ "" al)) (put-line (cat-str [mapcar csv-escape (list bn al ar pc)] ",")))))