如何在R中导入混合制表符和逗号分隔的ASCII文件

时间:2014-06-09 17:48:27

标签: r

我有一个ASCII文件,其中包含一组MODIS数据,其中包含每个采集日期的一系列像素值。数据格式为:

  • ASCII值以逗号分隔
  • 数据值在标题行之后开始,并以空格分隔。

数据中两个日期的示例如下所示:

----------------------------------------------------------------------------
MODIS HDF Tile                     MOD13Q1.A2003273.h11v03.005.2008260032604.hdf
Scientific Data Set (Band)         250m_16_days_EVI
Number of Values Passing QA Filter 81 of 81
Applying the Scale of .0001        MEAN: 0.24070987654321, STD-DEV: 0.0257345931611507 
Unscaled                           MEAN: 2407.0987654321, STD-DEV: 257.345931611507


2213,2160,2206,2408,2369,2362,2423,2466,2318,2160,2429,2316,2260,2362,2431,2172,2021,2254,2424,2391,2427,2331,1934,2220,2235,2254,2186,2325,2046,1956,2273,2220,2235,2257,2425,2534,2141,2288,2273,2263,2436,2568,2603,2470,2561,2288,2369,2628,2725,2730,2603,2704,2744,2732,2624,2606,2694,2730,2718,2765,2771,2732,2771,2726,2694,2637,2699,2806,2712,2384,1904,1982,2747,2788,2610,2647,2408,2096,1946,1858,1791


----------------------------------------------------------------------------
MODIS HDF Tile                      MOD13Q1.A2003289.h11v03.005.2008263131227.hdf
Scientific Data Set (Band)          250m_16_days_EVI
Number of Values Passing QA Filter  81 of 81
Applying the Scale of .0001         MEAN: 0.261756790123457, STD-DEV: 0.0232843291670261 
Unscaled                            MEAN: 2617.56790123457, STD-DEV: 232.843291670261


2074,2323,2382,2574,2614,2661,2631,2599,2525,2399,2548,2545,2541,2599,2415,2428,2417,2518,2549,2471,2539,2520,2407,2358,2426,2461,2575,2427,2412,2518,2500,2394,2509,2567,2569,2648,2414,2573,2498,2626,2509,2708,2694,2654,2702,2536,2750,2804,2917,2926,2942,2938,2844,2839,2863,2985,3006,2991,2997,2937,2830,2838,2607,3101,3093,3085,2950,2881,2608,2570,2499,2233,2912,2833,2819,2348,2426,2541,2243,2239,2071

典型的ASCII文件包括大约900个日期,即900个“瓦片”的信息,其格式与上面列出的格式完全相同。每个像素的数量相同,即每个日期的81个值。

我想要的是阅读文件和每个日期,提取“MODIS HDF Tile”名称,例如MOD13Q1.A2003289.h11v03.005.2008263131227.hdf以及各列的每个像素值,如:

MODIS HDF Tile                                 Scientific Data Set (Band) V2    V3    V4     V5    V6    V7...
MOD13Q1.A2003273.h11v03.005.2008263131227.hdf  250m_16_days_ENVI         2213  2160  2206   2408  2369 .......
MOD13Q1.A2003289.h11v03.005.2008263131227.hdf  250m_16_days_ENVI         2074  2323  2382   2574  2614 ..... 

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:0)

也许这样的事情可以起作用

modis <- readLines("modis.txt")
headers <- grep("^MODIS", modis)

headtiles <- sapply(strsplit(modis[headers[1]],"\\s{2,}"), '[',1 )
headbands <- sapply(strsplit(modis[headers[1]+1],"\\s{2,}"), '[',1 )

tiles <- sapply(strsplit(modis[headers],"\\s{2,}"), '[',2 )
bands <- sapply(strsplit(modis[headers+1],"\\s{2,}"), '[',2 )

pxlines <- grep("(,.*?){5,}", modis)
pixels <- do.call(rbind, lapply(strsplit(modis[pxlines], ","), as.numeric))

dd<-data.frame(tiles, bands, pixels)
names(dd)<-c(headtiles , headbands , paste0("pixel", seq.int(ncol(pixels))))

这里我们通过所有行来查找标题行,然后我们假设下一行是带状线。然后我们寻找像素值有很多逗号的行。这是基于您提供的有限样本对数据文件做出很多假设。