将文件夹中的单独df转换为r中的单个df

时间:2019-09-12 15:07:18

标签: r dataframe

我有一个包含多个tsv格式数据帧文件的文件夹(df1.txt,df2.txt,df3.txt等)。我需要提取每个df中的两列数据(“ freq”,“ cdr”),并将它们聚合为包含两列的单个大型tsv数据帧,外加第三列指示文件名(它们来自何处)( “文件”,“ cdr”,“频率”)。

Individual df1:  "xxx" "freq"  "cdr" "zzz"
                  23   0.112   abc   ej
                  25   0.743   bbc   tj

final df:  "file"  "freq"  "cdr"
            df1     0.112   abc
            df1     0.743   bbc
            df2     0.444   abd
            df2     0.911   ccd

我尝试使用“ list.files”,但这只是为我提供了文件(df)名称的列表。我曾经考虑过使用“ parse”,但是我不确定该函数如何工作。作为R新手,我非常感谢您的帮助。

data.all <- list.files("/mnt/data/OUTPUT/", pattern="*.txt", full.names = TRUE)
sdata <- data.all[, "file", "freq", "cdr"

个人df1:“ xxx”“频率”“ cdr”“ zzz”                   23 0.112 abc ej                   25 0.743 bbc tj

最终df:“文件”“频率”“ cdr”             df1 0.112 abc             df1 0.743英国广播公司             df2 0.444 abd             df2 0.911 ccd

2 个答案:

答案 0 :(得分:1)

使用tidyverse软件包如何呢?

library(tidyverse)

#Lists contents of directory (data_dir) that satisfy regexp (end in .txt)
data_dir <- "your/directory"
your_df <- fs::dir_ls(data_dir, regexp = "\\.txt$")

#Maps read_delim across contents of data_dir list
your_df <- your_df %>% 
  map_dfr(read_delim, "\t", escape_double = FALSE, trim_ws = TRUE, .id = "source") %>%
mutate(source = str_replace(basename(source), ".txt", ""))

#selects your desired columns
your_df <- your_df %>%
select(freq, cdr, "file" = source)

答案 1 :(得分:0)

类似于另一个question,我会简单地使用newsAPI.getTopHeadlines(q: "Trump") { result in switch result { case .success(let trumpHeadlines): trumpNumber = trumpHeadlines.count return trumpNumber case .failure(let trumpError): print(trumpError) } } 循环吗?

for