我有一个包含多个tsv格式数据帧文件的文件夹(df1.txt,df2.txt,df3.txt等)。我需要提取每个df中的两列数据(“ freq”,“ cdr”),并将它们聚合为包含两列的单个大型tsv数据帧,外加第三列指示文件名(它们来自何处)( “文件”,“ cdr”,“频率”)。
Individual df1: "xxx" "freq" "cdr" "zzz"
23 0.112 abc ej
25 0.743 bbc tj
final df: "file" "freq" "cdr"
df1 0.112 abc
df1 0.743 bbc
df2 0.444 abd
df2 0.911 ccd
我尝试使用“ list.files”,但这只是为我提供了文件(df)名称的列表。我曾经考虑过使用“ parse”,但是我不确定该函数如何工作。作为R新手,我非常感谢您的帮助。
data.all <- list.files("/mnt/data/OUTPUT/", pattern="*.txt", full.names = TRUE)
sdata <- data.all[, "file", "freq", "cdr"
个人df1:“ xxx”“频率”“ cdr”“ zzz” 23 0.112 abc ej 25 0.743 bbc tj
最终df:“文件”“频率”“ cdr” df1 0.112 abc df1 0.743英国广播公司 df2 0.444 abd df2 0.911 ccd
答案 0 :(得分:1)
使用tidyverse软件包如何呢?
library(tidyverse)
#Lists contents of directory (data_dir) that satisfy regexp (end in .txt)
data_dir <- "your/directory"
your_df <- fs::dir_ls(data_dir, regexp = "\\.txt$")
#Maps read_delim across contents of data_dir list
your_df <- your_df %>%
map_dfr(read_delim, "\t", escape_double = FALSE, trim_ws = TRUE, .id = "source") %>%
mutate(source = str_replace(basename(source), ".txt", ""))
#selects your desired columns
your_df <- your_df %>%
select(freq, cdr, "file" = source)
答案 1 :(得分:0)
类似于另一个question,我会简单地使用newsAPI.getTopHeadlines(q: "Trump") { result in
switch result {
case .success(let trumpHeadlines):
trumpNumber = trumpHeadlines.count
return trumpNumber
case .failure(let trumpError):
print(trumpError)
}
}
循环吗?
for