如何!rm python_var(在Jupyter笔记本中)

时间:2018-12-28 23:35:55

标签: python bash jupyter rm

我知道我可以做到:

html_soup = BeautifulSoup(driver.page_source, 'html.parser')

但是我该如何做为变量。例如。

from selenium import webdriver
from bs4 import BeautifulSoup
import re

driver = webdriver.Chrome()
driver.get('https://www.kohls.com/catalog/mens-button-down-shirts-tops-clothing.jsp?CN=Gender:Mens+Silhouette:Button-Down%20Shirts+Category:Tops+Department:Clothing&cc=mens-TN3.0-S-buttondownshirts&kls_sbp=43160314801019132980443403449632772558&PPP=120&WS=0')

products = []
hyperlinks = []
reviewCounts = []
starRatings = []

pageCounter = 0
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
maxPageCount = int(html_soup.find('a', class_ = 'totalPageNum').text)+1
prod_containers = html_soup.find_all('li', class_ = 'products_grid')
counterProduct = 0
while (pageCounter < maxPageCount):
    for product in prod_containers:
        # If the product has review count, then extract:
        if product.find('span', class_ = 'prod_ratingCount') is not None:
            # The product name
            name = product.find('div', class_ = 'prod_nameBlock')
            name = re.sub(r"\s+", " ", name.text)
            products.append(name)

            # The product hyperlink
            hyperlink = product.find('span', class_ = 'prod_ratingCount')
            hyperlink = hyperlink.a
            hyperlink = hyperlink.get('href')
            hyperlinks.append(hyperlink)

            # The product review count
            reviewCount = product.find('span', class_ = 'prod_ratingCount').a.text
            reviewCounts.append(reviewCount)

            # The product overall star ratings
            starRating = product.find('span', class_ = 'prod_ratingCount')
            starRating = starRating.a
            starRating = starRating.get('alt')
            starRatings.append(starRating) 

    driver.find_element_by_xpath('//*[@id="page-navigation-top"]/a[2]').click()
    counterProduct +=1
    print(counterProduct)

2 个答案:

答案 0 :(得分:0)

您可以在Python中处理此操作,而无需使用魔法shell命令。我建议使用pathlib模块,以获得更现代的方法。对于您正在做的事情,它将是:

import pathlib
csv_files = pathlib.Path('/path/to/actual/files')
for csv_file in csv_files.glob('*.csv'):
    csv_file.unlink()

使用.glob()方法仅过滤要使用的文件,然后使用.unlink()删除它们(类似于os.remove())。

避免使用file作为变量,因为它是语言中的保留字。

答案 1 :(得分:0)

rm每次通话可以删除多个文件:

In [80]: !touch a.t1 b.t1 c.t1
In [81]: !ls *.t1
a.t1  b.t1  c.t1
In [82]: !rm -r a.t1 b.t1 c.t1
In [83]: !ls *.t1
ls: cannot access '*.t1': No such file or directory

如果起点是文件名列表:

In [116]: alist = ['a.t1', 'b.t1', 'c.t1']
In [117]: astr = ' '.join(alist)            # make a string
In [118]: !echo $astr                       # variable substitution as in BASH
a.t1 b.t1 c.t1
In [119]: !touch $astr                    # make 3 files
In [120]: ls *.t1
a.t1  b.t1  c.t1
In [121]: !rm -r $astr                    # remove them
In [122]: ls *.t1
ls: cannot access '*.t1': No such file or directory

使用Python自己的OS功能可能会更好,但是如果您对Shell的理解足够好,则可以使用%magics做很多相同的事情。


要在Python表达式中使用“魔术”,我必须使用基础函数,而不是“!”或“%”语法,例如

import IPython
for txt in ['a.t1','b.t1','c.t1']:
    IPython.utils.process.getoutput('touch %s'%txt)

使用getoutput的{​​{1}}(位于%sx之下)使用!!函数。但是,如果您要进行所有工作,则不妨使用Python本身提供的subprocess.Popen函数。


文件名可能需要添加引号,以确保shell不会出现语法错误:

os