如何使用id读取链接并在python

时间:2018-11-27 05:13:05

标签: python python-3.x pandas beautifulsoup

我试图用输入文件中的url和id编写csv文件,但我不知道。

我有以下格式的csv文件:

ID              Links
P51800010436    https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTcxNzkmRGl2aXNpb249NiZVc2VySUQ9MzQ5MjAmUm9sZUlEPTEmQXBwSUQ9NzUzNjYmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTI2JkV4dEFwcElEPQ%3d%3d
P51800001202    https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTMxOTcmRGl2aXNpb249NiZVc2VySUQ9MjU5MjQmUm9sZUlEPTEmQXBwSUQ9MjM3MzQmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTk3JkV4dEFwcElEPQ%3d%3d
P51800000150    https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTY1NSZEaXZpc2lvbj02JlVzZXJJRD03MjU3JlJvbGVJRD0xJkFwcElEPTExOTY2JkFjdGlvbj1TRUFSQ0gmQ2hhcmFjdGVyRD04MSZFeHRBcHBJRD0%3d
P51800001785    https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTU2NjUmRGl2aXNpb249NiZVc2VySUQ9MjgxODEmUm9sZUlEPTEmQXBwSUQ9MjY4NjcmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTIxJkV4dEFwcElEPQ%3d%3d

我尝试过的脚本:

from datetime import datetime
start_time = datetime.now()

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import re

import csv

link = []
rera_id = []

with open('D:/TF_Vishnu/link_with_rera_id.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')

    for row in reader:
        rera_id.append(row[0])
        link.append(row[1])

for index, rera_id, url in enumerate(rera_id, link):

    df_url = pd.read_csv(pd.compat.StringIO(url), header=None)

    df_rera_id = pd.read_csv(pd.compat.StringIO(rera_id), header=None)

    html=requests.get(url).content

    soup=BeautifulSoup(html, 'lxml')

    if (soup.find(text="Other Than Individual") == "Other Than Individual"): 

        print ("Processing Other Than Individual Link.......")

        table = soup.find_all("table",{"class":"table table-bordered table-responsive table-striped"})[1]

        df_2 = pd.concat([df_rera_id, df_url, df, df_1], axis=1)

        df_2.to_csv('D:/scrape_data/test.csv', index=False, header=False, mode='a'))

我想用熊猫写csv文件,就像第一列-rera_id,第二-链接,第三-数据等等。

请帮助并提出建议。对任何错误表示歉意

获取错误:

TypeError:“列表”对象不能解释为整数

1 个答案:

答案 0 :(得分:1)

问题出在您内置enumerate的使用上。第二个(可选)参数没有被视为另一个可迭代的对象,而是被视为枚举变量的初始值(在您的情况下为index),这就是为什么它需要和整数。您最好尝试直接枚举reader

with open('D:/TF_Vishnu/link_with_rera_id.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    for index, (rera_id, url) in enumerate(reader):
        # Your code below

希望有帮助!