我的CSV文件如下 -
Data1.csv
BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56
Data2.csv
AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany
我想要的输出:
AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City
worker,a1,b1,Calif,Los angeles
workplace,a1,b1,Calif,San Diego
platoon,a2,b2,Connec,Bridgeport
teracota,a3,b3,New York,Albany
因此必须将Identifier与BusinessNeedNumber匹配,并生成CSV文件以上的数据。 到目前为止,我的代码就像
record <- read.csv("Data1.csv",header=TRUE)
businessneedinformation <- read.csv("Data2.csv",header=TRUE)
for(i in record$BusinessNeedNumber){
if(i %in% businessneedinformation$Identifier){
keyword <- "NA"
busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)]
buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)]
replacementbusparent <- gsub(pattern=",",replacement="",x=busparent)
replacementbuschild <- gsub(pattern=",",replacement="",x=buschild)
campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="")
thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname)
}
List <- rbind(List, thislist)
}
当我使用for循环时,它非常慢,对于将近100000个条目需要花费很长时间,使用R中的索引可以更快地实现它。
答案 0 :(得分:1)
> zz <- "BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56"
> Data <- read.table(text=zz, header = TRUE,sep=',')
> Data
BusinessNeedParent BusinessNeedChild Identifier
1 a1 b1 45
2 a2 b2 60
3 a3 b3 56
> zz1 <- "AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany"
> Data1 <- read.table(text=zz1, header = TRUE,sep=',')
> Data1
AdvertiserName BusinessNeedNumber State City
1 worker 45 Calif Los angeles
2 workplace 45 Calif San Diego
3 platoon 60 Connec Bridgeport
4 teracota 56 New York Albany
> m <- merge(Data,Data1,by.x="Identifier",by.y="BusinessNeedNumber")
> m[,c(4,2,3,5,6)]
AdvertiserName BusinessNeedParent BusinessNeedChild State City
1 worker a1 b1 Calif Los angeles
2 workplace a1 b1 Calif San Diego
3 teracota a3 b3 New York Albany
4 platoon a2 b2 Connec Bridgeport
write.csv(m, file = "demoMerge.csv")
或者您可以使用
m1 <- Reduce(function(old, new) { merge(old, new, by.x='Identifier', by.y='BusinessNeedNumber') }, list_of_files)
> m1
Identifier BusinessNeedParent BusinessNeedChild AdvertiserName State City
1 45 a1 b1 worker Calif Los abngles
2 45 a1 b1 workplace Calif San Diego
3 56 a3 b3 teracota New York Albany
4 60 a2 b2 platoon Connec Bridgeport