我知道我可以做到:
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
但是我该如何做为变量。例如。
from selenium import webdriver
from bs4 import BeautifulSoup
import re
driver = webdriver.Chrome()
driver.get('https://www.kohls.com/catalog/mens-button-down-shirts-tops-clothing.jsp?CN=Gender:Mens+Silhouette:Button-Down%20Shirts+Category:Tops+Department:Clothing&cc=mens-TN3.0-S-buttondownshirts&kls_sbp=43160314801019132980443403449632772558&PPP=120&WS=0')
products = []
hyperlinks = []
reviewCounts = []
starRatings = []
pageCounter = 0
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
maxPageCount = int(html_soup.find('a', class_ = 'totalPageNum').text)+1
prod_containers = html_soup.find_all('li', class_ = 'products_grid')
counterProduct = 0
while (pageCounter < maxPageCount):
for product in prod_containers:
# If the product has review count, then extract:
if product.find('span', class_ = 'prod_ratingCount') is not None:
# The product name
name = product.find('div', class_ = 'prod_nameBlock')
name = re.sub(r"\s+", " ", name.text)
products.append(name)
# The product hyperlink
hyperlink = product.find('span', class_ = 'prod_ratingCount')
hyperlink = hyperlink.a
hyperlink = hyperlink.get('href')
hyperlinks.append(hyperlink)
# The product review count
reviewCount = product.find('span', class_ = 'prod_ratingCount').a.text
reviewCounts.append(reviewCount)
# The product overall star ratings
starRating = product.find('span', class_ = 'prod_ratingCount')
starRating = starRating.a
starRating = starRating.get('alt')
starRatings.append(starRating)
driver.find_element_by_xpath('//*[@id="page-navigation-top"]/a[2]').click()
counterProduct +=1
print(counterProduct)
答案 0 :(得分:0)
您可以在Python中处理此操作,而无需使用魔法shell命令。我建议使用pathlib
模块,以获得更现代的方法。对于您正在做的事情,它将是:
import pathlib
csv_files = pathlib.Path('/path/to/actual/files')
for csv_file in csv_files.glob('*.csv'):
csv_file.unlink()
使用.glob()
方法仅过滤要使用的文件,然后使用.unlink()
删除它们(类似于os.remove()
)。
避免使用file
作为变量,因为它是语言中的保留字。
答案 1 :(得分:0)
rm
每次通话可以删除多个文件:
In [80]: !touch a.t1 b.t1 c.t1
In [81]: !ls *.t1
a.t1 b.t1 c.t1
In [82]: !rm -r a.t1 b.t1 c.t1
In [83]: !ls *.t1
ls: cannot access '*.t1': No such file or directory
如果起点是文件名列表:
In [116]: alist = ['a.t1', 'b.t1', 'c.t1']
In [117]: astr = ' '.join(alist) # make a string
In [118]: !echo $astr # variable substitution as in BASH
a.t1 b.t1 c.t1
In [119]: !touch $astr # make 3 files
In [120]: ls *.t1
a.t1 b.t1 c.t1
In [121]: !rm -r $astr # remove them
In [122]: ls *.t1
ls: cannot access '*.t1': No such file or directory
使用Python自己的OS功能可能会更好,但是如果您对Shell的理解足够好,则可以使用%magics做很多相同的事情。
要在Python表达式中使用“魔术”,我必须使用基础函数,而不是“!”或“%”语法,例如
import IPython
for txt in ['a.t1','b.t1','c.t1']:
IPython.utils.process.getoutput('touch %s'%txt)
使用getoutput
的{{1}}(位于%sx
之下)使用!!
函数。但是,如果您要进行所有工作,则不妨使用Python本身提供的subprocess.Popen
函数。
文件名可能需要添加引号,以确保shell不会出现语法错误:
os