我定期向我发送一个文本文件,其中一列中包含html内容。我希望可以对此专栏做一个漂亮的汤,但似乎来源有限。
sample.csv:
id web-scraper-order html_content
0 15636 <div class="product-details detail-row"><div c...
1 15619 <div class="product-details detail-row"><div c...
2 15656 <div class="product-details detail-column"><di...
所需的输出:
id web-scraper-order html_content html_content2
0 15636 <div class="product-details detail-row"><div c... ['EF1744','Grey Three/Off White/Gold Metallic','$120','2019-06-22']
1 15619 <div class="product-details detail-row"><div c...['...','...','...','...']
2 15656 <div class="product-details detail-column"><di...['...','...','...','...']
html_content
中的内容
<div class="product-details detail-row"><div class="detail"><div class="pinfo-container"><span class="icon"></span><span class="title">Style</span></div><span> EF1744 </span></div><div class="detail"><div class="pinfo-container"><span class="icon"></span><span class="title">Colorway</span></div><span> Grey Three/Off White/Gold Metallic </span></div><div class="detail"><div class="pinfo-container"><span class="icon"></span><span class="title">Retail Price</span></div><span> $120 </span></div><div class="detail"><div class="pinfo-container"><span class="icon"></span><span class="title">Release Date</span></div><span> 2019-06-22 </span></div></div>
所需的文本在下面每一行:
['EF1744','Grey Three/Off White/Gold Metallic','$120','2019-06-22']
我找到了类似的帖子here,但由于我需要指定一列,因此似乎与我的帖子不兼容。
我很傲慢地做到了,但是没有运气。...
import numpy as np
import pandas as pd
import re
from bs4 import BeautifulSoup
d = pd.read_csv("sample.csv")
df = pd.DataFrame(d,columns=['web-scraper-order','html_content'])
soup = BeautifulSoup(df['html_content'],'xml')
style = [item.text.strip() for item in soup.find_all('div', class_='detail')]
答案 0 :(得分:0)
您可以使用CSS选择器create or replace procedure test01 is
vsql varchar2(50);
l_date DATE;
begin
vsql := 'select sysdate from dual';
execute immediate vsql into l_date; --<<<< here
DBMS_APPLICATION_INFO.SET_MODULE('TEST','Starting...');
dbms_lock.sleep ( 20 );
DBMS_APPLICATION_INFO.SET_MODULE(NULL, NULL);
end;
/
,即直接在类INTO clause
的任何标签下选择所有INTO clause
标签:
.detail > span
打印:
<span>