将列表写入CSV时,我将获取单个单元格中的所有项目

时间:2015-12-27 08:15:03

标签: python csv

我想要的输出:

|John Snow | Male | 36 | New York|

我得到的是什么:

|('John Snow','Male','36','New York')|

这是我的代码:

import requests
from bs4 import BeautifulSoup
import csv

url = "http://www.yellowpages.com/search?search_terms=coffe&geo_location_terms=Los+Angeles%2C+CA"
r = requests.get(url)

soup = BeautifulSoup(r.content)

links = soup.find_all("a")

g_data = soup.find_all("div",{"class":"info"})

list_info = []

# code hidden here

    y = business_name,addressRegion,addressLocality,postalCode
list_info.append(y)
print list_info
resultFile = open("output.csv",'wb')
writer = csv.writer(resultFile)
for item in list_info:
    writer.writerow([item]

我该如何解决这个问题?

2 个答案:

答案 0 :(得分:0)

您对y的定义会产生 tuple

然后,将y追加到list_info,创建一个元组列表

然后,当您想将结果写入文件时,您将迭代列表的元素。在for圈内,item元组

在您编写该元组之前(将其提供给csv.writer对象),您将其置于单元素列表中。因为csv.writer不会递归地从输入数据结构中提取元素,所以它会将单字段行写入文件。该字段是元组隐式转换为字符串。以下是显示该问题的简化示例:

business_name = 'cokolwiek'
addressRegion = 'CA'
addressLocality = 'San Francisco'
postalCode = '12345'
y = business_name,addressRegion,addressLocality,postalCode
print(y)
# ('cokolwiek', 'CA', 'San Francisco', '12345')
print([y])
# [('cokolwiek', 'CA', 'San Francisco', '12345')]

你要做的就是item附近放置方括号 - csv.writer.writerow能够很好地处理元组,所以你可以直接编写它们。或者,您可以使用list函数将元组转换为列表。

business_name = 'cokolwiek'
addressRegion = 'CA'
addressLocality = 'San Francisco'
postalCode = '12345'
y = business_name,addressRegion,addressLocality,postalCode
list_info = []
list_info.append(y)
list_info.append(y)
resultFile = open("output.csv",'wb')
writer = csv.writer(resultFile, delimiter="|")
for item in list_info:
   writer.writerow(item)
for item in list_info:
   writer.writerow([item])
for item in list_info:
   writer.writerow(list(item))
resultFile.close()
$ cat output.csv 
cokolwiek|CA|San Francisco|12345
cokolwiek|CA|San Francisco|12345
('cokolwiek', 'CA', 'San Francisco', '12345')
('cokolwiek', 'CA', 'San Francisco', '12345')
cokolwiek|CA|San Francisco|12345
cokolwiek|CA|San Francisco|12345

答案 1 :(得分:0)

在这个TXR解决方案中,我们定义从头开始,用于表示HTML文档对象模型(DOM)的数据结构,HTML解析器以及用于查询信息的一些函数标签,类等的DOM。这些用于生成逗号分隔的企业及其地址列表。除了用于将HTML转义过滤到文本的内置功能之外,不使用特定于域的库。此外,CSV字段也会被转义:如果它们包含逗号或双精度,则会用引号括起来,双引号本身也会加倍。

首先,在使用curl

捕获的数据副本上运行删节示例
$ txr yp.txr  data
Drnk Coffe + Tea,West Hollywood,CA,90069
International Coffe and Tea,Torrance,CA,90501
Birdcage Coffe House,Long Beach,CA,90802
International Coffe and Tea,Woodland Hills,CA,91367
International Coffe and Tea,Valencia,CA,91355
Toast Bakery Cafe,Los Angeles,CA,90048
[ .. snip ... ]
Chavez Cafe,Los Angeles,CA,90021
Standard Coffee Service Company,N/A,N/A,N/A

yp.txr的内容:

@;;
@;; Mini HTML parser
@;;
@;; Lisp object model
@(do
   (defstruct doc-node nil)

   (defstruct text-node doc-node
     (content ""))

   (defstruct tag-node doc-node
     tag (attrs nil) (children nil)))
@;; Parse tag attribs
@(define attribs (list))@\
  @(local attr val)@\
  @(coll :gap 0 :vars (attr val))@\
    @(cases) @{attr /[^\s'"=\/>]+/}@/\s*=\s*/"@val"@\
    @(or) @{attr /[^\s'"=\/>]+/}@/\s*=\s*/'@val'@\
    @(or) @{attr /[^\s'"=\/>]+/}@(bind val "")@\
    @(end)@\
  @(until)@/>|\/>/@\
  @(end)@\
  @(bind list @(mapcar (op cons (intern (trim-str @1)) (html-decode @2))
                       attr val))@\
@(end)
@;; Parse HTML: output is a Lisp nested list in which elements
@;; look like (symbol attribs nested-content ...)
@;; Attribs look like ((sym1 "value") (sym2 "value") ...)
@(define html (html))@\
  @(local nest attr tag text)@\
  @(cases)@\
    <!--@(skip)>@\
  @(or)@\
    <@{tag /[\w\d]+/}@(attribs attr)>@\
    @(coll :vars (nest))@\
       @(cases)@(html nest)@\
       @(or)@{text /[^<]+/}@\
            @(bind nest @(new text-node content (html-decode text)))@\
       @(end)@\
    @(last)</@tag>@\
    @(end)@\
    @(bind html @(new tag-node tag (intern tag) attrs attr children nest))@\
  @(or)@\
    <@{tag /[\w\d]+/}@(attribs attr)/>@\
    @(bind html @(new tag-node tag (intern tag) attrs attr))@\
  @(end)@\
@(end)
@;;
@;; Helpful lisp functions for querying DOM
@;;
@(do
   ;; Collect document nodes satisfying function
   (defun get-by (html selector-fun)
     (typecase html
       (tag-node
         (let ((others (get-by html.children selector-fun)))
           (if [selector-fun html]
             (cons html others)
             others)))
       (list
         [mappend (op get-by @1 selector-fun) html])))

   ;; Collect nodes from list satisfying function
   (defun filter-by (html selector-fun)
     (keep-if [orf (notf (op typep @1 'tag-node)) selector-fun] html))

   ;; tag selector fun
   (defun select-tag (tag)
     (op eq @1.tag tag))

   ;; class selector
   (defun select-class (class-sym class)
     (lambda (html)
       (iflet ((class-attr (cdr (assql class-sym html.attrs))))
         (let ((classes (split-str class-attr " ")))
           (memqual class classes)))))

   (defun get-by-tag (html tag)
     (get-by html (select-tag tag)))

   (defun filter-by-tag (html tag)
     (filter-by html (select-tag tag)))

   (defun get-by-class (html class)
     (get-by html (select-class 'class class)))

   (defun get-by-class-like-attrib (html sym val)
     (get-by html (select-class sym val)))

   (defun filter-by-class (html class)
     (filter-by html (select-class 'class class)))

   (defun filter-by-class-like-attrib (html sym val)
     (filter-by html (select-class sym val))))
@(skip)
// – End comScore Tag
@(freeform "")
@(coll)@(html doc)@(end)
@(bind divs @(filter-by-class (get-by-tag doc 'div) "info"))
@(do
   (defun get-content (div sym class)
     (let* ((a (get-by-class-like-attrib div sym class)))
       (if (and a [a 0].children)
         [[a 0].children 0].content
         "N/A")))

   (defun csv-escape (str)
     (if (match-regex str #/.*[",]/)
       `"@(regsub #/"/ "\"" str)"`
       str))

   (each ((div divs))
     (let ((bn (get-content div 'class "business-name"))
           (al (get-content div 'itemprop "addressLocality"))
           (ar (get-content div 'itemprop "addressRegion"))
           (pc (get-content div 'itemprop "postalCode")))
       (set al (regsub #/,\s*/ "" al))
       (put-line (cat-str [mapcar csv-escape (list bn al ar pc)] ",")))))