问题

通常我必须处理这样的文件

.
.
<pattern A>
.
.
<pattern B>
.. <pattern B1>
..
.. <pattern B2>
..
.. <pattern B3>
<pattern B>
.
.
<pattern A>
<pattern B>
.
.

我通常会发现我想专注于<pattern A>之间/之外的每件事，或者专注于

<pattern B>
.. <pattern B1>
..
.. <pattern B2>
..
.. <pattern B3>
<pattern B>

忽略整个文件中的特定<pattern B>。

使用sed可以做到这一点吗？

具体示例

1。

来自文件

<html>
<div>
1st div
</div>
<div>
2nd div
</div>
..

<div>
10th div
</div>
</html>

如何提取

<div>
3rd div
.
.
7th div
</div>

2。

来自文件

<html>
.
.
<ol> # the first <ol> in the whole file
.
.
</ol> # the last </ol> in the whole file
.

如何提取

<ol> # the first <ol> in the whole file
.
.
</ol> # the last </ol> in the whole file

我尝试过的

我当前的解决方案非常丑陋且不可靠。我只是删除了所有换行符，使整个文件变成了一个直线，并且做了很多丑陋的sed-魔术。.幸运的是，就我而言，我通常可以将换行符重新输入..但这绝对不是正确的方式。

请让我知道是否应提供进一步的信息。我知道这是一个模糊的问题，但这正是我想要的。sed可以像这样检测整个文件中的模式吗？感谢您的帮助！

Answer 1

这可能对您有用（GNU sed）：

library(raster)
files <- list.files(path = "...", full.names = T, pattern = ".tif")

stk <- stack()

for (i in files){
  print(i)
  as <- raster(files[i])
  stk <- addLayer(stk, as)
}

jday <-c("landsatNDVISC05SLC2000017.tif","landsatNDVISC05SLC2000033.tif",
"landsatNDVISC05SLC2000049.tif","landsatNDVISC05SLC2000065.tif","landsatNDVISC05SLC2000081.tif",
"landsatNDVISC05SLC2000097.tif","landsatNDVISC05SLC2000113.tif","landsatNDVISC05SLC2000129.tif",
"landsatNDVISC05SLC2000145.tif","landsatNDVISC05SLC2000161.tif","landsatNDVISC05SLC2000177.tif",
"landsatNDVISC05SLC2000193.tif","landsatNDVISC05SLC2000209.tif","landsatNDVISC05SLC2000225.tif",
"landsatNDVISC05SLC2000241.tif","landsatNDVISC05SLC2000257.tif","landsatNDVISC05SLC2000273.tif",
"landsatNDVISC05SLC2000289.tif","landsatNDVISC05SLC2000305.tif","landsatNDVISC05SLC2000321.tif",
"landsatNDVISC05SLC2000337.tif","landsatNDVISC05SLC2000353.tif")

jday <- as.numeric(substr(jday, 24, 25)) #substract the julien days (which I think these number represent before .tif; or you can substract the names from the 'files' vector)

dates <- as.Date(jday, origin=as.Date("2000-01-01")) # create a Date vector

stk <- setZ(stk, dates) # assign the date vector to the raster stack

raster <- zApply(stk, by = format(dates,"%Y-%m"), fun = mean, na.rm = T) # create the monthly stack

这仅打印文件中的第3至第7 sed -nE '/<div>/{H;:a;n;H;/<\/div>/!ba;x;s/^/x/;/^x{3,7}\n/{H;s/^[^\n]*\n//p;g;s///;s/\n.*//;x;s///;b};s/\n.*//;x}' file个。它使用保留空间的第一行作为计数器，每次遇到文件中的div时，都会将其追加到保留空间，递增计数器并决定是否打印div当下。可以使用以下相同的机制来打印所有div：

div

有用的“ sed”练习

问题

具体示例

1。

2。

我尝试过的

1 个答案: