在R中循环添加满足正则表达式的数据帧?

时间:2015-08-07 16:32:22

标签: regex r dataframe rbind

我之前问了一个类似的问题,但却令人困惑地问道。所以现在我试图以更有秩序的方式做到这一点。

我正在运行一个循环,根据650个ID变量导入最多6个数据帧。我想为650个案例中的每一个附加这6个数据帧。我导入这样的数据:

for(i in 1:650){
  try(part1 <-  read.csv(file = paste0("Twitter Scrapes/searchTwitter/09July/",MP.ID[i],".csv")))
  try(part2 <-  read.csv(file = paste0("Twitter Scrapes/userTimeline/08July/",MP.ID[i],".csv")))
  try(part3 <-  read.csv(file = paste0("Twitter Scrapes/userTimeline/16July/",MP.ID[i],".csv")))
  try(part4 <-  read.csv(file = paste0("Twitter Scrapes/searchTwitter/17July/",MP.ID[i],".csv")))
  try(part5 <-  read.csv(file = paste0("Twitter Scrapes/userTimeline/24July/",MP.ID[i],".csv")))
  try(part6 <-  read.csv(file = paste0("Twitter Scrapes/searchTwitter/24July/",MP.ID[i],".csv")))

一切正常。如果任何部分不存在,则try-arguments确保循环继续执行。

因此,在某些情况下,并非所有6个数据集都存在。这意味着我不能简单地读下一行

combinedData <- rbind(part1, part2, part3, part4, part5, part6)

因为其中一个元素可能不存在,因此意味着无法生成附加的数据集。这就是为什么我认为让rbind命令运行满足正则表达式要求的任何数据帧(即partX)会更好。在这种情况下,即使,例如,part5不存在,它也可以简单地附加现有的其他数据帧,然后继续循环中的下一个ID。

但是,我不知道该怎么做。如果你能帮我解决这个问题会很棒,而且我很抱歉之前发布这个令人困惑的问题。

1 个答案:

答案 0 :(得分:0)

我可能会使用+--------------+-------------+---------+-------------------+---------+ | organization | currentYear | revenue | percentDifference | lastRev | +--------------+-------------+---------+-------------------+---------+ | asdf | 2010 | 83863 | NULL | NULL | | asdf | 2011 | 5463 | -93.4858 | 83863 | | asdf | 2012 | 45345 | 730.0384 | 5463 | | ghjk | 2009 | 32463 | NULL | NULL | | ghjk | 2010 | 352667 | 986.3660 | 32463 | +--------------+-------------+---------+-------------------+---------+ 中的var externals = [ 'react', 'react/addons', 'jquery', 'react-router', 'events' ] module.exports = function(grunt){ grunt.config.set('browserify', { dist: { options: { external: externals, transform: [ ['babelify', { loose: 'all' }] ] }, files: { ".tmp/public/js/bundle.js": ["assets/js/bundle.js", "react/**/*"] } }, vendor: { src: [ './node_modules/react/dist/react.js', './node_modules/react/dist/react-with-addons.js', './node_modules/jquery/dist/jquery.js', './node_modules/react-router/lib/index.js', './node_modules/events/events.js' ], dest: '.tmp/public/js/dependencies/vendor.js', options: { alias: { 'react': './node_modules/react/dist/react.js', 'react/addons': './node_modules/react/dist/react-with-addons.js', 'jquery': './node_modules/jquery/dist/jquery.js', 'react-router': './node_modules/react-router/lib/index.js', 'events': './node_modules/events/events.js' } } } }); grunt.loadNpmTasks('grunt-browserify'); } 参数,而是使用列表:

module.exports = function(grunt) {

    grunt.config.set('watch', {
        api: {

            // API files to watch:
            files: ['api/**/*', '!**/node_modules/**']
        },
        assets: {

            // Assets to watch:
            files: ['assets/**/*', 'tasks/pipeline.js', '!**/node_modules/**'],

            // When assets are changed:
            tasks: ['syncAssets' , 'linkAssets', 'browserify:dist']
        },
    react: {
      files: ['react/**/*'],
      tasks: ['browserify:dist']
    }
    });

    grunt.loadNpmTasks('grunt-contrib-watch');
};

您可以按recursive'

按ID进行分组
list.files

因此,使用此提示,您可以使用(lf <- list.files('~/desktop/test', recursive = TRUE, full.names = TRUE)) # [1] "/Users/rawr/desktop/test/feb/three.csv" # [2] "/Users/rawr/desktop/test/jan/one.csv" # [3] "/Users/rawr/desktop/test/jan/three.csv" # [4] "/Users/rawr/desktop/test/jan/two.csv" # [5] "/Users/rawr/desktop/test/jul/one.csv" # [6] "/Users/rawr/desktop/test/jul/two.csv" 读取它们,从而生成包含所有数据框的一个对象

grep

然后id <- c('one','two','three') for (ii in id) { print(lf[grepl(ii, lf)]) cat('\n') } # [1] "/Users/rawr/desktop/test/jan/one.csv" "/Users/rawr/desktop/test/jul/one.csv" # # [1] "/Users/rawr/desktop/test/jan/two.csv" "/Users/rawr/desktop/test/jul/two.csv" # # [1] "/Users/rawr/desktop/test/feb/three.csv" "/Users/rawr/desktop/test/jan/three.csv" 他们:

lapply

或者您可以在

之前的步骤中ll <- lapply(id, function(ii) { files <- lf[grepl(ii, lf)] setNames(lapply(files, function(x) read.csv(x, header = FALSE)), files) }) setNames(ll, id) # $one # $one$`/Users/rawr/desktop/test/jan/one.csv` # V1 # 1 one # # $one$`/Users/rawr/desktop/test/jul/one.csv` # V1 # 1 one # 2 one # 3 one # # # $two # $two$`/Users/rawr/desktop/test/jan/two.csv` # V1 # 1 two # # $two$`/Users/rawr/desktop/test/jul/two.csv` # V1 # 1 two # # # $three # $three$`/Users/rawr/desktop/test/feb/three.csv` # V1 # 1 three # # $three$`/Users/rawr/desktop/test/jan/three.csv` # V1 # 1 three