Question

我知道我可以做到：

html_soup = BeautifulSoup(driver.page_source, 'html.parser')

但是我该如何做为变量。例如。

from selenium import webdriver
from bs4 import BeautifulSoup
import re

driver = webdriver.Chrome()
driver.get('https://www.kohls.com/catalog/mens-button-down-shirts-tops-clothing.jsp?CN=Gender:Mens+Silhouette:Button-Down%20Shirts+Category:Tops+Department:Clothing&cc=mens-TN3.0-S-buttondownshirts&kls_sbp=43160314801019132980443403449632772558&PPP=120&WS=0')

products = []
hyperlinks = []
reviewCounts = []
starRatings = []

pageCounter = 0
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
maxPageCount = int(html_soup.find('a', class_ = 'totalPageNum').text)+1
prod_containers = html_soup.find_all('li', class_ = 'products_grid')
counterProduct = 0
while (pageCounter < maxPageCount):
    for product in prod_containers:
        # If the product has review count, then extract:
        if product.find('span', class_ = 'prod_ratingCount') is not None:
            # The product name
            name = product.find('div', class_ = 'prod_nameBlock')
            name = re.sub(r"\s+", " ", name.text)
            products.append(name)

            # The product hyperlink
            hyperlink = product.find('span', class_ = 'prod_ratingCount')
            hyperlink = hyperlink.a
            hyperlink = hyperlink.get('href')
            hyperlinks.append(hyperlink)

            # The product review count
            reviewCount = product.find('span', class_ = 'prod_ratingCount').a.text
            reviewCounts.append(reviewCount)

            # The product overall star ratings
            starRating = product.find('span', class_ = 'prod_ratingCount')
            starRating = starRating.a
            starRating = starRating.get('alt')
            starRatings.append(starRating) 

    driver.find_element_by_xpath('//*[@id="page-navigation-top"]/a[2]').click()
    counterProduct +=1
    print(counterProduct)

Answer 1

您可以在Python中处理此操作，而无需使用魔法shell命令。我建议使用pathlib模块，以获得更现代的方法。对于您正在做的事情，它将是：

import pathlib
csv_files = pathlib.Path('/path/to/actual/files')
for csv_file in csv_files.glob('*.csv'):
    csv_file.unlink()

使用.glob()方法仅过滤要使用的文件，然后使用.unlink()删除它们（类似于os.remove()）。

避免使用file作为变量，因为它是语言中的保留字。

Answer 2

rm每次通话可以删除多个文件：

In [80]: !touch a.t1 b.t1 c.t1
In [81]: !ls *.t1
a.t1  b.t1  c.t1
In [82]: !rm -r a.t1 b.t1 c.t1
In [83]: !ls *.t1
ls: cannot access '*.t1': No such file or directory

如果起点是文件名列表：

In [116]: alist = ['a.t1', 'b.t1', 'c.t1']
In [117]: astr = ' '.join(alist)            # make a string
In [118]: !echo $astr                       # variable substitution as in BASH
a.t1 b.t1 c.t1
In [119]: !touch $astr                    # make 3 files
In [120]: ls *.t1
a.t1  b.t1  c.t1
In [121]: !rm -r $astr                    # remove them
In [122]: ls *.t1
ls: cannot access '*.t1': No such file or directory

使用Python自己的OS功能可能会更好，但是如果您对Shell的理解足够好，则可以使用％magics做很多相同的事情。

要在Python表达式中使用“魔术”，我必须使用基础函数，而不是“！”或“％”语法，例如

import IPython
for txt in ['a.t1','b.t1','c.t1']:
    IPython.utils.process.getoutput('touch %s'%txt)

使用getoutput的{{1}}（位于%sx之下）使用!!函数。但是，如果您要进行所有工作，则不妨使用Python本身提供的subprocess.Popen函数。

文件名可能需要添加引号，以确保shell不会出现语法错误：

os

如何！rm python_var（在Jupyter笔记本中）

2 个答案: