如何避免'for loop'在R中更快地处理CSV文件?

时间:2014-02-05 07:53:16

标签: r csv indexing

我的CSV文件如下 -

Data1.csv

BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56

Data2.csv

AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany

我想要的输出:

 AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City
 worker,a1,b1,Calif,Los angeles
 workplace,a1,b1,Calif,San Diego
 platoon,a2,b2,Connec,Bridgeport
 teracota,a3,b3,New York,Albany

因此必须将Identifier与BusinessNeedNumber匹配,并生成CSV文件以上的数据。 到目前为止,我的代码就像

record <- read.csv("Data1.csv",header=TRUE)
businessneedinformation <- read.csv("Data2.csv",header=TRUE)

for(i in record$BusinessNeedNumber){
  if(i %in% businessneedinformation$Identifier){ 
   keyword <- "NA"
  busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)]
    buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)]
   replacementbusparent <- gsub(pattern=",",replacement="",x=busparent)
   replacementbuschild <- gsub(pattern=",",replacement="",x=buschild)
   campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="")
   thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname)
  }
 List <- rbind(List, thislist) 
 }

当我使用for循环时,它非常慢,对于将近100000个条目需要花费很长时间,使用R中的索引可以更快地实现它。

1 个答案:

答案 0 :(得分:1)

> zz <- "BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56"
> Data <- read.table(text=zz, header = TRUE,sep=',')
> Data
  BusinessNeedParent BusinessNeedChild Identifier
1                 a1                b1         45
2                 a2                b2         60
3                 a3                b3         56
> zz1 <- "AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany"
> Data1 <- read.table(text=zz1, header = TRUE,sep=',')
> Data1
  AdvertiserName BusinessNeedNumber    State        City
1         worker                 45    Calif Los angeles
2      workplace                 45    Calif   San Diego
3        platoon                 60   Connec  Bridgeport
4       teracota                 56 New York      Albany
> m <- merge(Data,Data1,by.x="Identifier",by.y="BusinessNeedNumber")
> m[,c(4,2,3,5,6)]
  AdvertiserName BusinessNeedParent BusinessNeedChild    State        City
1         worker                 a1                b1    Calif Los angeles
2      workplace                 a1                b1    Calif   San Diego
3       teracota                 a3                b3 New York      Albany
4        platoon                 a2                b2   Connec  Bridgeport
write.csv(m, file = "demoMerge.csv")  

或者您可以使用

m1 <- Reduce(function(old, new) { merge(old, new, by.x='Identifier', by.y='BusinessNeedNumber') }, list_of_files)
> m1
  Identifier BusinessNeedParent BusinessNeedChild AdvertiserName    State        City
1         45                 a1                b1         worker    Calif Los abngles
2         45                 a1                b1      workplace    Calif   San Diego
3         56                 a3                b3       teracota New York      Albany
4         60                 a2                b2        platoon   Connec  Bridgeport