Question

我有大量的csv文件，所有文件都具有相同的格式。我需要遍历所有这些对象，并选择“中位数”列（第4列）并将其写入新文件，然后将它们全部组合在一起。

它们的格式如下。

   Wind_Speed Average  Median Power_Curve Difference
1         0.0     NaN      NA           0        NaN
2         0.5     NaN      NA           0        NaN
3         1.0     NaN      NA           0        NaN
4         1.5     NaN      NA           0        NaN
5         2.0     NaN      NA           0        NaN
6         2.5   14.12   14.12          24       -9.9
7         3.0   31.02   31.51          48      -17.0
8         3.5   55.06   57.12          96      -40.9
9         4.0  106.70  109.89         192      -85.3
10        4.5  178.13  180.76         288     -109.9
11        5.0  277.68  278.57         408     -130.3
12        5.5  401.91  400.41         540     -138.1
13        6.0  568.38  569.73         696     -127.6
14        6.5  765.16  762.98         912     -146.8
15        7.0  999.09 1002.82        1104     -104.9
16        7.5 1222.77 1216.91        1332     -109.2
17        8.0 1460.55 1463.50        1524      -63.4
18        8.5 1601.32 1597.00        1656      -54.7
19        9.0 1658.94 1664.40        1680      -21.1
20        9.5 1662.15 1667.81        1692      -29.9
21       10.0 1661.49 1665.47        1692      -30.5
22       10.5 1659.75 1663.02        1692      -32.2
23       11.0 1660.59 1661.13        1692      -31.4
24       11.5 1660.18 1659.44        1692      -31.8
25       12.0 1662.33 1666.21        1692      -29.7
26       12.5 1661.55 1661.10        1692      -30.5
27       13.0 1667.06 1677.50        1692      -24.9
28       13.5 1660.06 1661.63        1692      -31.9
29       14.0 1671.95 1686.82        1692      -20.0
30       14.5 1675.67 1687.73        1692      -16.3
31       15.0 1672.57 1685.97        1692      -19.4
32       15.5 1666.96 1673.73        1692      -25.0
33       16.0 1670.11 1681.58        1692      -21.9
34       16.5 1669.24 1686.14        1692      -22.8
35       17.0 1669.85 1677.95        1692      -22.1
36       17.5 1656.20 1644.46        1692      -35.8
37       18.0 1687.57 1687.57        1692       -4.4
38       18.5 1691.64 1691.69        1692       -0.4
39       19.0 1681.02 1686.78        1692      -11.0
40       19.5 1689.79 1689.79        1692       -2.2
41       20.0     NaN      NA        1692        NaN

理想情况下，新文件中的新列名应为旧文件名。

我知道它正在像下面那样工作，但是我不知道如何在下一列的新表中编写该列并继续进行ii。

files2 <- list.files(path="~/test2",pattern="*.csv", full.names=TRUE, recursive=FALSE)

for(ii in files2){   

titlename<- tools::file_path_sans_ext(basename(files2)) 

mydata2 <-read.csv(ii, header = T, stringsAsFactors=FALSE)
mydata2<- mydata2[,4]

???

}

Answer 1

setwd()#set path to where files are  
csv_files<-list.files(pattern = "*.csv") #list csv files in path   
    temp<-NULL #set empty object
        for(i in csv_files){
          temp[i]<-read.csv(i)[4]# number 4 is the column you want to select, set to what you want..
          names(temp)<-stringr::str_remove(names(temp),".csv") #use this line if you want to remove.csv from column name in combined csv
          write.csv(temp,"combined.csv",row.names = F)# write combined csv
        }

这似乎对我有用。

Answer 2

使用base-R和lapply的替代方法：

file <- list.files(path = "~/path", pattern = "\\.csv")

自定义函数，用于读取csv，提取文件名并分配给列。（有时在read.csv中粘贴路径可能会在这些循环中导致路径错误）

read_files_assign_filename <- function(filename){
  item <- read.csv(paste("~/path", filename, sep = "/"), header = TRUE)[4] 
  colnames(item) <- substr(filename,0,nchar(filename)-4) #remove.csv  
  item #return item
  }

包好包皮，包扎成一体。

final_result <- do.call(cbind, lapply(files, read_files_assign_filename))

希望能有所帮助/起作用！

阅读csv的第四列，并合并到r

2 个答案: