对于XY坐标对,使用数据表,对行数进行计数,将具有完整集,DT2部分集的行写入DT1

时间:2019-01-24 02:21:53

标签: r datatable count

我有一个很大的LandSat数据集。因此,我正在使用一个很小的(比较)测试数据集(下面的列数减少的部分数据集)。

要取出非着陆的行,我试图删除所有时间步长(年,月,日)中未出现的XY坐标对。

我已删除基于归一化差异水指数(NDWI)<= 0的数据。 但是,与卫星地图相比,该图显示出很多点都是水。这些点不会在每个时间步上绘制。Plot of all XY coordinate pairs for 60 timesteps, dense black = land, else water

因此,如果我计算XY坐标对具有的行数,然后将确实具有时间步长总数的行写入文件,然后将没有时间步长总数的行写入另一个文件,则可以然后绘制这两个文件进行检查。

到目前为止的代码

# Clean AllDatatable where the total number of timestesp for each XLon YLat
# combination does not equal 321 (10 for test data).
# AllDatatable should represent land only.

library(data.table)
library(plyr)
AllDatatable <- mCTestData #insert partial test data here
countXY <- count(AllDatatable, c("V2","V1"))
totalTimesteps <- 10 # edit to 321 for full data set

# below is not working - need nifty datatable expresssion or loop to cycle through countXY?
if (countXY = totalTimesteps) {
  AllDatatableFile1 <- AllDatatable[,. 
                       (XLon,YLat,Year,Month,Day,Hour,Minute,Second,Red)]
} else {
AllDatatableFile2 <- AllDatatable[,. 
                       (XLon,YLat,Year,Month,Day,Hour,Minute,Second,Red)]
}

countXY产生

         V2       V1 freq
1  -2309088 -1605138    6
2  -2308838 -1572413   10
3  -2308763 -1572238   10
4  -2307988 -1598338   10
5  -2306488 -1573838   10
6  -2305138 -1594663    9
7  -2304788 -1573213    9
8  -2304763 -1572988    9
9  -2303863 -1572163    9
10 -2287413 -1567888   10

因此,文件1应该具有50行(5 x freq = 10),文件2应该具有42行。

部分(并非所有值的列都存在)测试数据集是

structure(list(V1 = c(-1605137.5, -1572412.5, -1572237.5, -1598337.5, 
-1573837.5, -1594662.5, -1573212.5, -1572162.5, -1567887.5, -1605137.5, 
-1572412.5, -1572237.5, -1598337.5, -1573837.5, -1594662.5, -1573212.5, 
-1572987.5, -1572162.5, -1567887.5, -1572412.5, -1572237.5, -1598337.5, 
-1573837.5, -1594662.5, -1573212.5, -1572987.5, -1572162.5, -1567887.5, 
-1572412.5, -1572237.5, -1598337.5, -1573837.5, -1594662.5, -1573212.5, 
-1572987.5, -1572162.5, -1567887.5, -1572412.5, -1572237.5, -1598337.5, 
-1573837.5, -1594662.5, -1573212.5, -1572987.5, -1572162.5, -1567887.5, 
-1572412.5, -1572237.5, -1598337.5, -1573837.5, -1573212.5, -1572987.5, 
-1572162.5, -1567887.5, -1605137.5, -1572412.5, -1572237.5, -1598337.5, 
-1573837.5, -1594662.5, -1573212.5, -1572987.5, -1572162.5, -1567887.5, 
-1605137.5, -1572412.5, -1572237.5, -1598337.5, -1573837.5, -1594662.5, 
-1573212.5, -1572987.5, -1572162.5, -1567887.5, -1605137.5, -1572412.5, 
-1572237.5, -1598337.5, -1573837.5, -1594662.5, -1573212.5, -1572987.5, 
-1572162.5, -1567887.5, -1605137.5, -1572412.5, -1572237.5, -1598337.5, 
-1573837.5, -1594662.5, -1572987.5, -1567887.5), V2 = c(-2309087.5, 
-2308837.5, -2308762.5, -2307987.5, -2306487.5, -2305137.5, -2304787.5, 
-2303862.5, -2287412.5, -2309087.5, -2308837.5, -2308762.5, -2307987.5, 
-2306487.5, -2305137.5, -2304787.5, -2304762.5, -2303862.5, -2287412.5, 
-2308837.5, -2308762.5, -2307987.5, -2306487.5, -2305137.5, -2304787.5, 
-2304762.5, -2303862.5, -2287412.5, -2308837.5, -2308762.5, -2307987.5, 
-2306487.5, -2305137.5, -2304787.5, -2304762.5, -2303862.5, -2287412.5, 
-2308837.5, -2308762.5, -2307987.5, -2306487.5, -2305137.5, -2304787.5, 
-2304762.5, -2303862.5, -2287412.5, -2308837.5, -2308762.5, -2307987.5, 
-2306487.5, -2304787.5, -2304762.5, -2303862.5, -2287412.5, -2309087.5, 
-2308837.5, -2308762.5, -2307987.5, -2306487.5, -2305137.5, -2304787.5, 
-2304762.5, -2303862.5, -2287412.5, -2309087.5, -2308837.5, -2308762.5, 
-2307987.5, -2306487.5, -2305137.5, -2304787.5, -2304762.5, -2303862.5, 
-2287412.5, -2309087.5, -2308837.5, -2308762.5, -2307987.5, -2306487.5, 
-2305137.5, -2304787.5, -2304762.5, -2303862.5, -2287412.5, -2309087.5, 
-2308837.5, -2308762.5, -2307987.5, -2306487.5, -2305137.5, -2304762.5, 
-2287412.5), V3 = c(1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 
1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 
1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 
1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 
1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 1987L, 
1987L, 1987L, 1987L, 1987L, 1988L, 1988L, 1988L, 1988L, 1988L, 
1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 
1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 
1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 
1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 1988L, 
1988L, 1988L, 1988L, 1988L, 1988L), V4 = c(9L, 9L, 9L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), V5 = c(11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 27L, 27L, 27L, 27L, 27L, 
27L, 27L, 27L, 27L, 27L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 
29L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 16L, 16L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 
17L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 18L, 18L, 18L, 18L, 
18L, 18L, 18L, 18L, 18L, 18L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L), V6 = c(1439L, 
1218L, 1017L, 1279L, 993L, 1111L, 1046L, 1153L, 1330L, 1398L, 
1161L, 1058L, 1238L, 1035L, 1133L, 1115L, 1117L, 1180L, 1302L, 
1240L, 1114L, 1264L, 1100L, 1194L, 1143L, 1228L, 1225L, 1396L, 
1204L, 1052L, 1271L, 1090L, 1218L, 1131L, 1187L, 1263L, 1388L, 
1214L, 1076L, 1226L, 1128L, 1202L, 1173L, 1198L, 1196L, 1404L, 
1249L, 1044L, 1268L, 1059L, 1108L, 1210L, 1161L, 1358L, 1314L, 
1215L, 1074L, 1337L, 1035L, 1103L, 1087L, 1174L, 1235L, 1417L, 
1372L, 1239L, 1113L, 1341L, 1069L, 1089L, 1094L, 1172L, 1153L, 
1347L, 1192L, 1093L, 962L, 1233L, 997L, 1020L, 1021L, 1128L, 
1164L, 1177L, 1220L, 1106L, 909L, 1224L, 1025L, 1063L, 1010L, 
1005L)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"), row.names = c(NA, 
-92L), class = c("data.table", "data.frame"))

在Windows 10上进行测试时,我正在Google云平台(GCP)上运行RStudio,以在完整的数据集上进行此操作。

此外:这大约是我使用R,GCP,RStudio等的第三个月。我正在作为一名志愿者来尝试保存Dampier Archipelago Australia上的岩石艺术品。非常感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

感谢@SymbolixAU。

处理256,177,9339行数据的代码行是:

AllDataDT1 <- AllDatatable[, .N, by = .(XLon,YLat)][ N == 321 ][ AllDatatable , on = c("XLon","YLat") , nomatch = 0 ]

fwrite(AllDataDT1, file="AllDataWith321Timesteps.csv",append=FALSE)

AllDataDT2 <- AllDatatable[, .N, by = .(XLon,YLat)][ N != 321 ][ AllDatatable , on = c("XLon","YLat") , nomatch = 0 ]

fwrite(AllDataDT2, file="AllDataNot321Timesteps.csv",append=FALSE)