我正在尝试在多个条件下对一个大数据框(北半球的气旋轨道)进行子集化:下面的数据
centro <- read.table("https://forms.naturwissenschaften.ch/imilast/_ERAinterim_1.5_1979_MTEX/ERAinterim_1.5_NH_M02_19790101_20121231_MTEX.txt?_ga=2.18919096.1825595846.1546710263-1112023567.1546710263", sep="", fill = T, nrows = 500,
header = F, skip = 2) # read here only the first 500 rows
centro <- na.omit(centro)
colnames(centro) <- c("Code","CycloneNo","StepNo","DateI10","Year","Month","Day","Time","LongE","LatN","Intensity1","Intensity2","Intensity3")
当列StepNo == 1时,我只想对空间框中(如-4和40 E经度和32-45 N lat)形成的旋风器(基于唯一列CycloneNo)进行子集化。通常,这很容易做到:
centro_subs <- centro[centro$LongE>=-4 & centro$LongE <= 40 & centro$LatN>= 32 & centro$LatN <= 45,]
但是,我只想保留在此框中形成的旋风分离器(当StepNo == 1时),而其余轨道也都保留在此框之外。
我试图这样做:
df_s <- centro[1,]
df_s[1,] <- NA # create an empty dataframe to be filled in the iteration
for (i in 1:length(unique(centro$CycloneNo))){
print(i)
a <- centro[centro$LongE[centro$StepNo==1]>= -4 &
centro$LongE[centro$StepNo==1] <= 40 &
centro$LatN[centro$StepNo==1]>= 32 & centro$LatN
<=45[centro$StepNo==1],]
df_s <- rbind(a, df_s)
}
但是,最终结果是在一个充满NA的空数据框中。我知道这很难在这里描述。我觉得自己有点接近,但是我现在也很疲惫,试图找到新的方法。
答案 0 :(得分:1)
我不认为您想要循环播放。我敢肯定这不是最优雅的方法,但是我认为它是可行的。
step1s <- subset(centro, StepNo == 1) # only take step 1 of all cyclones
keeps <- step1s$CycloneNo[step1s$LongE>=-4 & step1s$LongE <= 40 & step1s$LatN>= 32 & step1s$LatN <= 45] # find cyclone numbers for cyclones meeting the condition
centro_sub <- centro[centro$CycloneNo %in% keeps, ] # keep all steps of cyclones meeting the conditions
答案 1 :(得分:1)
约瑟夫提供了一个很好的答案。或者,可以在data.table中使用它,这可能会以某种速度为代价提供更高的可读性。
centro <- data.table(centro)
centro[CycloneNo %in% CycloneNo[StepNo == 1 &
LongE %between% c(-4,40) &
LatN %between% c(32,45)]]