如何将提取的Web数据转换为python中的csv文件

时间:2018-10-05 15:35:58

标签: python

首先,我使用python从多个网站提取数据

import urllib
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import traceback
pages = [str(i) for i in range(1,6)]

for page in pages:
    # Read data from url
    url1 = requests.get("https://www.top500.org/list/2018/06/?page="+ page)
    # Parse the url using BeautifulSoup
    soup= BeautifulSoup(url1.content, 'html.parser')
    #Removing an encountered special characters
    repString = "HLRS- Hochstleistungsrechenzentrum Stuttgart"
    # Finding table data in url1
    for record in soup.findAll('tr'):
        tbltxt =""
        for data in record.findAll('td'):
            try:
                tbltxt = tbltxt + data.text + ","
            except:
                tbltxt = tbltxt + replString+ ","
                pass

        print(tbltxt)
        print()

后来,我希望将此数据转换为csv文件。为此,我尝试在for循环之间插入以下代码,但遇到错误

Rank= entry.Rank.text
            Rank = Rank.replace(",", "|")
            Site = entry.Site.text
            Site = Site.replace(",", "|")
            System = entry.System.text
            System = System.replace(",", "|")
            Cores = entry.Cores.text
            Cores = Cores.replace(",", "|")
            Rmax (TFlops/s) = entry.Rmax (TFlops/s).text
            Rmax (TFlops/s) = Rmax (TFlops/s).replace(",","|")
            Rpeak (TFlops/s) = entry.Rpeak (TFlops/s).text
            Rpeak (TFlops/s) = Rpeaks (TFlops/s).replace(",","|")
            Power (kW) = entry.Power (kW).text
            Power (kW) = Power (kW).replace(",","|")
            f1.write(Rank + "," + Site + "," + System + "," + Cores + "," + Rmax (TFlops/s) + "," + Rpeak (TFlops/s) + ","+ Power (kW) + "\n")

但说错了

yntaxError: can't assign to function call (<ipython-input-22-043a1b549895>, line 25)
  File "<ipython-input-22-043a1b549895>", line 25
    Rmax (TFlops/s) = entry.Rmax (TFlops/s).text
SyntaxError: can't assign to function call

所以,任何人都可以帮助我摆脱这种情况。

1 个答案:

答案 0 :(得分:0)

将其更改为类似

Rmax_TFlops_per_s = entry.Rmax(TFlops/s).text

问题是您试图将值分配给值(函数调用)

所有这些行都有相同的问题:

Rmax (TFlops/s) = entry.Rmax (TFlops/s).text
Rmax (TFlops/s) = Rmax (TFlops/s).replace(",","|")
Rpeak (TFlops/s) = entry.Rpeak (TFlops/s).text
Rpeak (TFlops/s) = Rpeaks (TFlops/s).replace(",","|")
Power (kW) = entry.Power (kW).text
Power (kW) = Power (kW).replace(",","|")

Rmax(TFlops / s)。请记住,此处的'('')'不被视为常规字符串字符。