我有一个使用以下约定命名的文本文件目录:“Location[A-Z]_House[0-15]_Day[0_15].txt
”,因此示例是LA_H05_D14.txt。有没有办法分割名称,使它们成为一个因素?更具体地说,我想使用位置之后的字母[A-Z]。例如。 LB_H01_D01.txt是位置“B”,属于位置B的所有数据都标记为“B”?
我已将文件中的所有数据导入一个数据框:
l = list.files(patt="txt$", full.names = T)
library(dplyr)
Df = bind_rows(lapply(l, function(i) {temp <- read.table(i,stringsAsFactors = FALSE,sep=";");
setNames(temp, c("Date","Time","Timestamp","PM2_5(ug/m3)","AQI(US)","AQI(CN)","PM10(ug/m3)","Outdoor AQI(US)","Outdoor AQI(CN)","Temperature(C)","Temperature(F)","Humidity(%RH)","CO2(ppm)","VOC(ppb)"
))}), .id = "id")
数据看起来像这样,带有“id”列:
head(Df)
id Date Time Timestamp PM2_5(ug/m3) AQI(US) AQI(CN) PM10(ug/m3) Outdoor AQI(US) Outdoor AQI(CN) Temperature(C) Temperature(F)
1 1 2017/10/17 20:31:38 1508272298 102.5 175 135 512 0 0 30 86.1
2 1 2017/10/17 20:31:48 1508272308 93.6 171 124 477 0 0 30 86.1
3 1 2017/10/17 20:31:58 1508272318 98.0 173 129 397 0 0 30 86.0
4 1 2017/10/17 20:32:08 1508272328 98.0 173 129 422 0 0 30 86.0
5 1 2017/10/17 20:32:18 1508272338 104.3 176 137 466 0 0 30 86.0
6 1 2017/10/17 20:32:28 1508272348 101.6 175 134 528 0 0 30 86.0
Humidity(%RH) CO2(ppm) VOC(ppb)
1 43 466 -1
2 43 467 -1
3 42 468 -1
4 42 469 -1
5 42 471 -1
6 42 471 -1
答案 0 :(得分:2)
独立于有关id列内容的问题,您可以使用以下代码从文件名中提取信息:
#you may use the original filenames
filenames <- basename(l)
#or the content of the id column
filenames <- as.character(Df$id) #if you have read in filenames in the Df
#for demonstration here a definition of exemplary filenames
filenames <- c("LA_H01_D01.txt"
,"LA_H02_D02.txt"
,"LD_H01_D14.txt"
,"LD_H01_D15.txt")
filenames <- gsub("_H|_D", "_", filenames)
filenames <- gsub(".txt|^L", "", filenames)
fileinfo <- as.data.frame(do.call(rbind, strsplit(filenames, "_")))
colnames(fileinfo) <- c("Location", "House", "Day")
fileinfo[, c("House", "Day")] <- apply(fileinfo[, c("House", "Day")], 2, as.numeric)
# Location House Day
# 1 A 1 1
# 2 A 2 2
# 3 D 1 14
# 4 D 1 15
#add the information to your Df as new columns
Df <- cbind(Df, fileinfo)
#the whole thing as a function used in your data import
add_fileinfo <- function(df, filename) {
filename <- gsub("_H|_D", "_", filename)
filename <- gsub(".txt|^L", "", filename)
fileinfo <- as.data.frame(do.call(rbind, strsplit(filename, "_")))
colnames(fileinfo) <- c("Location", "House", "Day")
fileinfo[, c("House", "Day")] <- apply(fileinfo[, c("House", "Day")], 2, as.numeric)
cbind(df, fileinfo[rep(seq_len(nrow(fileinfo)), each= nrow(df)),])
}
Df = bind_rows(lapply(l, function(i)
{temp <- read.table(i,stringsAsFactors = FALSE,sep=";");
setNames(temp, c("Date","Time","Timestamp","PM2_5(ug/m3)","AQI(US)","AQI(CN)","PM10(ug/m3)","Outdoor AQI(US)","Outdoor AQI(CN)","Temperature(C)","Temperature(F)","Humidity(%RH)","CO2(ppm)","VOC(ppb)"
));
temp <- add_fileinfo(temp, i);
}
), .id = "id")
答案 1 :(得分:1)
像这样(通用)解决方案应该让你前进。
mydata1 = read.csv(path1, header=T)
mydata2 = read.csv(path2, header=T)
然后,合并
myfulldata = merge(mydata1, mydata2)
只要mydata1和mydata2至少有一个具有相同名称的公共列(允许在mydata1中匹配观察到mydata2中的观察),这将像魅力一样工作。它还需要三行。
如果我有20个文件包含我想要观察观察的数据怎么办?假设它们都有一个允许合并的公共列,我仍然需要读取20个文件(20行代码)和merge()二乘二...所以我可以将20个数据帧与19个合并语句合并像这样:
mytempdata = merge(mydata1, mydata2)
mytempdata = merge(mytempdata, mydata3)
.
.
.
mytempdata = merge(mytempdata, mydata20)
这很乏味。您可能正在寻找一种更简单的方法。如果你是,我写了一个函数来解决你的困境,称为multmerge()。*这是定义函数的代码:
multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)
这是一个很好的资源,可以帮助你。