我在名为doc的对象中有一个HTLM文档
> doc
<!DOCTYPE html>
<h1>Hello</h1>
<br>
<p>I am an html file</p>
<script myscript1 src="https://website.com/javascripts.js" type="text/javascript"></script>
<p>I am a paragraph</p>
<script myscript2 src="https://website2.com/function.js" type="text/javascript"></script>
我的目标是创建一个R函数,以从doc中删除脚本为myscript1的行
<script myscript1 src="https://website.com/javascripts.js" type="text/javascript"></script>
我尝试了以下代码,但是它不起作用:
remove <- "<script myscript1 src="https://website.com/javascripts.js" type="text/javascript"></script>"
doc <- doc[!grepl(paste(remove), doc),]
注意:删除myscript1后,由于使用了xPath,我需要从文档中获取一些元素。
能帮我吗?谢谢
答案 0 :(得分:1)
一种方法是首先将html文件的字符向量表示形式转换为R并进行处理。为此,我们可以将externalptr对象(blob)写为文本html文件,然后使用基本函数readLines读回。考虑:
class MyTime:
""" Create some time """
def __init__(self,hrs = 0,mins = 0,sec = 0):
"""Splits up whole time into only seconds"""
totalsecs = hrs*3600 + mins*60 + sec
self.hours = totalsecs // 3600
leftoversecs = totalsecs % 3600
self.minutes = leftoversecs // 60
self.seconds = leftoversecs % 60
def __str__(self):
return '{0}:{1}:
{2}'.format(self.hours,self.minutes,self.seconds)
def to_seconds(self):
# converts to only seconds
return (self.hours * 3600) + (self.minutes * 60) + self.seconds
def between(t1,t2,x):
t1seconds = t1.to_seconds()
t2seconds = t2.to_seconds()
xseconds = x.to_seconds()
if t1seconds <= xseconds < t2seconds:
return True
return False
currentTime = MyTime(0,0,0)
doneTime = MyTime(10,3,4)
x = MyTime(2,0,0)
print(between(currentTime,doneTime,x))