Question

我正在尝试将输出写入csv文件。我已经尝试了熊猫和csv，但是我只得到一个空的csv文件。我想念什么？

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
r = requests.get('https://superstats.dk/program?aar=2018%2F2019')
bs=BeautifulSoup(r.content, "lxml")

table_div=bs.find(id="content")
rows = table_div.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    cols=[x.text.strip() for x in cols]
    print (cols)

df= pd.DataFrame(cols, columns=['Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer'])
df.to_csv('Superliga.csv', index=none, encoding='utf-8')

我希望将csv文件中的print（cols）输出。

Answer 1

我不知道您是如何真正使用csv库的，但是使用它的csvwriter.writerow(row_values)方法，您可以轻松地将数据逐行写入到csv文件中。

在谈论使用pandas时，您的代码中的问题在此行中有所掩盖：

df= pd.DataFrame(cols, columns=['Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer'])

主要原因是使用'cols'变量。假设来自页面网页的表数据具有 n 行。现在，您的'cols'变量以迭代方式获取与行相关的数据，因此在 nth 迭代之后，它将仅包含 nth 行的数据。并且最有可能在您链接的页面中 n 行为空。因此，数据框为空，因为您使用空列表初始化了数据框。因此，如果您在每次迭代后将'cols'变量的数据附加到列表中，那就更好了。现在，这个具有列表集合的新变量（让它为“ row_data”）可以用于创建pandas数据框。

要将这些单词放入代码中，请参见下文：

使用csv库

import requests
from bs4 import BeautifulSoup
#import pandas as pd
import csv
r = requests.get('https://superstats.dk/program?aar=2018%2F2019')
bs=BeautifulSoup(r.content, "lxml")
#Create a csv.writer object using the csv file you wish to write in
writer = csv.writer(open('sample.csv','w'))
#Use writerow method of the object to write your first row/header contents
writer.writerow(['Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer'])

table_div=bs.find(id="content")
rows = table_div.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    cols=[x.text.strip() for x in cols]
    print (cols)
    #Some cols are empty, therefore avoid writing them to the file 
    if len(cols)>0:
      #Append all the incoming data row by row
      writer.writerow(cols)

#df= pd.DataFrame(cols, columns=['Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer'])
#df.to_csv('Superliga.csv', index=none, encoding='utf-8')

使用pandas库

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
r = requests.get('https://superstats.dk/program?aar=2018%2F2019')
bs=BeautifulSoup(r.content, "lxml")

#Create an empty list, and use it to append data row by row
row_data = []

table_div=bs.find(id="content")
rows = table_div.find_all('tr')
for row in rows:
  cols=row.find_all('td')
  cols=[x.text.strip() for x in cols]
  #Some cols are empty, therefore avoid adding them to the 'row_data'
  if len(cols)>0:
    row_data.append(cols[0:6])

#Create a dataframe in one shot using the 'row_data' variable
df = pd.DataFrame(row_data, columns=['Dag', 'Dato', 'Hold', 'Resultat', 'Tilskuere', 'Dommer'])    
print(df)
df.to_csv('Superliga.csv', index = None, encoding='utf-8')

无法将抓取结果写入csv文件

1 个答案: