无法将新数据集添加到新列中

时间:2017-07-26 18:01:21

标签: python excel csv

Python&在这里编写Noobie(已经学习了3天),所以请放轻松我!

我有一个脚本来抓取一个网站,并希望将提取的数据放在.csv表中的两个单独的列中

scrape脚本运行良好,并给我一个日期列表;我现在希望将该列表放在.csv表中的2个单独的列中,但是,我制作的脚本只需要2个列表并将它们放入单个列中

我不想复制该列,因为稍后(当我计算出我想要实现的内容时)它将是一个单独的列表 - 因此它将是两个具有唯一数据的独立列

提前谢谢!

import urllib
import urllib.request
from bs4 import BeautifulSoup
import pandas

colnames = ['id']
data1 = pandas.read_excel('ch.xlsx', names=colnames, dtype=str)
names = data1.id.tolist()

def make_soup (url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage,"html.parser")
    return soupdata

chid = names

chdatasaved=""
chdatasaved2=""

for numb in chid:
    soup = make_soup ("https://beta.companieshouse.gov.uk/company/" + numb + "/filing-history")
    for record in soup.findAll("tr", limit=2):
        chdata=""
        for data in record.findAll("td", limit=1):
            chdata = chdata+","+data.text
        if len(chdata) !=0:
            chdatasaved = chdatasaved + chdata[0:]

filename = "date.csv"
f = open (filename, "w")
headers = "date, detail"
f.write (headers)
f.write (chdatasaved + chdatasaved + "\n")
f.close ()

nanojohn - 您的解决方案有效,但现在每个角色都有自己的行(见下文)>

2

1

Ĵ

û

名词

2

0

1

7

这会是一个问题吗?

import urllib
import urllib.request
from bs4 import BeautifulSoup
import pandas

colnames = ['id']
data1 = pandas.read_excel('ch.xlsx', names=colnames, dtype=str)
names = data1.id.tolist()

def make_soup (url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage,"html.parser")
    return soupdata

chid = names

chdatasaved = []

for numb in chid:
    soup = make_soup ("https://beta.companieshouse.gov.uk/company/" + numb + "/filing-history")
    for record in soup.findAll("tr", limit=2):
        chdata=""
        for data in record.findAll("td", limit=1):
            chdata = chdata+","+data.text
        if len(chdata) !=0:
            chdatasaved = chdatasaved.append(chdata[0:])

filename = "date.csv"
headers = "date, detail"
with open(filename,'w') as f:
    f.write(headers + '\n')
    for curDate in chdatasaved:
        f.write(curDate + "," + curDate + "\n")

1 个答案:

答案 0 :(得分:0)

更新回答

您可以通过执行以下操作将字符保留在同一行:

with open(filename,'w') as f:
    f.write(headers + '\n')
    f.write(chdatasaved+ "," + chdatasaved+ "\n")

这是有效的,因为chdatasaved是一个字符串

上一个回答

您可以轻松遍历列表中的所有元素,输出如下:

for curDate in chdatasaved:
    f.write(curDate + "," + curDate + "\n")

请注意,您还需要将换行符分隔符"\n"添加到f.write(headers)行。

通常,最好使用with语句读取/写入文件,如下所示,这将自动处理文件的关闭:

with open(filename,'w') as f:
    f.write(headers + '\n')
    for curDate in chdatasaved:
        f.write(curDate + "," + curDate + "\n")