迭代地在多个csv文件中执行功能

时间:2016-08-27 03:29:35

标签: r csv

我有很多csv文件存储在一个文件夹中。即file1.csv,file2.csv,file3.csv等 每个csv文件包含每个对象的相同测量值。 文件如下所示:

ID     time    measuremment1    measurement2    measurement3   
 1     5       12               324             123
 1     6       123              654             45
 1     3       346              556             548

另一个看起来像这样:

ID    time    measurement1    measurement2    measurement3
 2     2       234             345            253
 2     8       35              998            316
 2     17      515             1005           323 
 2     50      156             155            616

等等。另外,我有一个数据框,我想为每个对象(文件)执行几次计算,如下所示:

calc<- data.frame(mean1 = mean(measurement1), var1 = var(measurement1),
sd1 = sd(measurement1), mean2 = mean(measurement2), var2 = var(measurement2),
sd2 = sd(measurement2))

等,我想要做的是找到一种方法来迭代地读取每个csv文件并为每个对象执行这些计算。最后,我想将它们导出到一个单独的csv文件中(以便我需要集中的信息),或者在R控制台中打印它并将其从那里复制到文本或excel文件。 我在R工作 任何人都可以提供任何帮助吗? 谢谢!

2 个答案:

答案 0 :(得分:2)

这样的事情:

#region IronTiger Boss
            if (Owner.Name == "IronTiger")
            {
                byte times = (byte)Kernel.Random.Next(1, 3);
                byte ref_times = (byte)Kernel.Random.Next(1, 6);
                for (byte i = 0; i < times; i++)
                {
                    uint Uid = 0;
                    byte type = (byte)Kernel.Random.Next(1, 28);

                    switch (type)
                    {
                        case 1:
                            Uid = 824020;    
                            break;

                        case 2:
                            Uid = 824019;    
                            break;

                        case 3:
                            Uid = 824018;    
                            break;

                        case 4:
                            Uid = 823060;    
                            break;   

                        case 5:
                            Uid = 823061;    
                            break;

                        case 6:
                            Uid = 823060;    
                            break;

                        case 7:
                            Uid = 823059;    
                            break;

                        case 8:
                            Uid = 823058;    
                            break;

                        case 9:
                            Uid = 822072;    
                            break;

                        case 10:
                            Uid = 822071;    
                            break;

                        case 11:
                            Uid = 821033;    
                            break;

                        case 12:
                            Uid = 820076;    
                            break;

                        case 13:
                            Uid = 820075;    
                            break;

                        case 14:
                            Uid = 820074;    
                            break;

                        case 15:
                            Uid = 820073;    
                            break;

                        case 16:
                            Uid = 800917;    
                            break;

                        case 17:
                            Uid = 800811;    
                            break;

                        case 18:
                            Uid = 800810;    
                            break;

                        case 19:
                            Uid = 800725;    
                            break;

                        case 20:
                            Uid = 800618;    
                            break;

                        case 21:
                            Uid = 800522;    
                            break;

                        case 22:
                            Uid = 800422;    
                            break;

                        case 23:
                            Uid = 800255;    
                            break;

                        case 24:
                            Uid = 800255;    
                            break;

                        case 25:
                            Uid = 800142;    
                            break;

                        case 26:
                            Uid = 800111;    
                            break;

                        case 27:
                            Uid = 800020;    
                            break;

                        case 28:
                            Uid = 821034;    
                            break;    
                    }

                    if (Uid != 0)
                    {
                        killer.Owner.Inventory.Add(Uid, 0, 1);
                        DeadPool.Kernel.SendWorldMessage(new DeadPool.Network.GamePackets.Message("Congratulations! " + killer.Name + " has killed " + Name + " and dropped! " + Database.ConquerItemInformation.BaseInformations[Uid].Name + "!", System.Drawing.Color.White, 2011), Program.Values);
                        return;
                    }    
                }
            }
            #endregion  

选项1:读入所有文件,合并为一个数据框,然后汇总

使用此方法,所有数据文件都将加载到R列表中。

library(dplyr)

dat = sapply(list.files(pattern="csv$"), function(file) { df = read.csv(file, stringsAsFactors=FALSE, header=TRUE) df$source = file df }, simplify=FALSE) dat = bind_rows(dat) 汇总:

ID

或者在较新的dat.summary = dat %>% group_by(ID) %>% summarise_each(funs(mean(., na.rm=TRUE), var(., na.rm=TRUE), sd(., na.rm=TRUE)), -time) 成语中:

dplyr

选项2:读取并汇总每个单独的文件,然后将各个摘要绑定到单个摘要数据框中

这样,您一次只能将一个数据文件加载到内存中。

dat.summary = dat %>% group_by(ID) %>%
  summarise_at(vars(matches("measurement")), 
               funs(mean(., na.rm=TRUE), var(., na.rm=TRUE), sd(., na.rm=TRUE)))

现在保存摘要:

dat.summary = sapply(list.files(pattern="csv$"), function(file) {
  df = read.csv(file, stringsAsFactors=FALSE, header=TRUE)

  # Summarise by ID
  df %>% group_by(ID) %>%
    summarise_at(vars(matches("measurement")), 
                 funs(mean(., na.rm=TRUE), var(., na.rm=TRUE), sd(., na.rm=TRUE)))
})

dat.summary = bind_rows(dat.summary)

write.csv(dat.summary, "my_summary.csv", row.names=FALSE)

答案 1 :(得分:2)

亚历克斯, 这是一个多步骤的过程。

以下是我的行为:

步骤1:使用read.csv函数读取所有文件。

csv1<-read.csv("1.csv")
csv2<-read.csv("1.csv")
csv3<-read.csv("1.csv")

第2步: 您需要将它们组合在一个csv文件中。

csv1$type<-"1"
csv2$type<-"2"
csv3$type<-"3"
csv<-rbind(csv1, csv2,csv3)

确保列匹配,否则上面的最后一步将引发错误。

步骤3:

研究如何使用dplyr查找摘要统计信息。 SO上有很多例子。只有在看到你自己尝试过后,我才能提供帮助。

希望这有帮助。