python将列名称和写入值作为表单独的列

时间:2015-09-08 07:01:31

标签: python web-scraping web-crawler

我的代码

using (HttpClient client = new HttpClient())
{
    client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("text/xml"));//set how to get data
    using (var content = new MultipartFormDataContent())//post by content type multipart/form-data
    {
        NameValueCollection dataCollection;//the datas you want to post
        HashSet<string> filePaths;//the files you want to post
        var formDatas = this.GetFormDataByteArrayContent(dataCollection);//get collection
        var files = this.GetFileByteArrayContent(filePaths);//get collection
        Action<List<ByteArrayContent>> act = (dataContents) =>
        {//declare an action
            foreach (var byteArrayContent in dataContents)
            {
                content.Add(byteArrayContent);
            }
        };
        act(formDatas);//process act
        act(files);//process act
        try
        {
            var result = client.PostAsync(this.txtUrl.Text, content).Result;//post your request
        }
        catch (Exception ex)
        {
            //error
        }
    }
}

我在csv

中的输出

Reynolds American Inc. Consolidated-Tomoka Land Co. British American Tobacco

8.30%7.50%7.10%6.60%6.40%5.90%5.30%4.80%4.70%4.10%

与网站名称相同的所需输出

参考http://www.wintergreenfund.com/reports/top-ten/

此外,unicode无效。需要帮助

我的新代码

from lxml import html
import requests
import csv
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')


# example site
page = requests.get('http://www.wintergreenfund.com/reports/top-ten/')
tree = html.fromstring(page.text)
#This will create a list of services:

tname = tree.xpath('//*[@id="colLeft"]//table//tr/td[1]/text()')
tvalue = tree.xpath('//table//tr/td[2]/text()')



print tname
print tvalue

print 'Input the csv file'
csvfile = raw_input("> ")

res = tname,tvalue


#Assuming res is a list of lists
with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerows(res)

我和[&#39; &#39;]在其中并且还清空[]

1 个答案:

答案 0 :(得分:1)

首先,如果你想在每个相应的索引上组合两个列表,你应该使用zip(),目前你正在创建一个包含两个列表的元组 - res = tname,tvalue - 然后将其写为是对csv。

另外,您应该首先使用xpath获取表中的每一行,然后使用xpath从中获取每个必需的td元素。而不是使用当前使用的两个xpath。

示例 -

from lxml import html
import requests
import csv

page = requests.get('http://www.wintergreenfund.com/reports/top-ten/')
tree = html.fromstring(page.text)

csvrows = []
for rows in tree.xpath('//*[@id="colLeft"]//table//tr'):
    row1text = rows.xpath('./td[1]/text()')
    row2text = rows.xpath('./td[2]/text()')
    if row1text and row2text:
        csvrows.append([row1text[0],row2text[0]])
print(csvrows)
print('Input the csv file')
csvfile = input("> ")
with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerow(['Name','Value']) #substitute as appropriate.
    writer.writerows(csvrows)