无法在CSV文件python中编写

时间:2019-10-23 21:43:52

标签: python pandas beautifulsoup

我尝试使用pandas数据框进行抓取后将数据写入csv,但是即使在程序执行后,csv也为空。头是首先写入的,但是当数据帧生效时,它们也会被覆盖。 这是代码:

from bs4 import BeautifulSoup
import requests
import re as resju
import csv
import pandas as pd
re = requests.get('https://www.farfeshplus.com/Video.asp?ZoneID=297')

soup = BeautifulSoup(re.content, 'html.parser')

links = soup.findAll('a', {'class': 'opacityit'})
links_with_text = [a['href'] for a in links]

headers = ['Name', 'LINK']
# this is output file, u can change the path as you desire, default is the working directory
file = open('data123.csv', 'w', encoding="utf-8")
writer = csv.writer(file)
writer.writerow(headers)

for i in links_with_text:
    new_re = requests.get(i)
    new_soup = BeautifulSoup(new_re.content, 'html.parser')
    m = new_soup.select_one('h1 div')
    Name = m.text

    print(Name)

    n = new_soup.select_one('iframe')
    ni = n['src']

    iframe = requests.get(ni)
    i_soup = BeautifulSoup(iframe.content, 'html.parser')

    d_script = i_soup.select_one('body > script')
    d_link = d_script.text

    mp4 = resju.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
    final_link = mp4.findall(d_link)[0]
    print(final_link)

    df = pd.DataFrame(zip(Name, final_link))

    df.to_csv(file, header=None, index=False)

file.close()

df.head()返回:

 0  1
0  ل  h
1  ي  t
2  ل  t
3  ى  p
4     s
   0  1
0  ل  h
1  ي  t
2  ل  t
3  ى  p
4     s

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

似乎您正在使用多种库来写入csv,pandas可以很好地处理所有这些,因此无需使用python内置的csv模块-

我在下面修改了您的代码-这应该将您的数据帧作为一个完整的df返回并以csv的形式写出。

还使用Headers=None将列设置为空,因此它们将由索引号引用。

from bs4 import BeautifulSoup
import requests
import re as resju
#import csv
import pandas as pd
re = requests.get('https://www.farfeshplus.com/Video.asp?ZoneID=297')

soup = BeautifulSoup(re.content, 'html.parser')

links = soup.findAll('a', {'class': 'opacityit'})
links_with_text = [a['href'] for a in links]

names_ = [] # global list to hold all iterable variables from your loops
final_links_ = []

for i in links_with_text:
    new_re = requests.get(i)
    new_soup = BeautifulSoup(new_re.content, 'html.parser')
    m = new_soup.select_one('h1 div')
    Name = m.text
    names_.append(name) # append to global list. 


    print(Name)

    n = new_soup.select_one('iframe')
    ni = n['src']

    iframe = requests.get(ni)
    i_soup = BeautifulSoup(iframe.content, 'html.parser')

    d_script = i_soup.select_one('body > script')
    d_link = d_script.text

    mp4 = resju.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
    final_link = mp4.findall(d_link)[0]
    print(final_link)
    final_links_.append(final_link) # append to global list.


df = pd.DataFrame(zip(names_, final_links_)) # use global lists.
df.columns = ['Name', 'LINK']

df.to_csv(file, index=False)