将stdin读入R

时间:2014-04-05 19:31:15

标签: r

我有一个结构如下的文件:

123

Jhon: NewYork, Boston, gainesville

Mike: LosAngeles

Almudena: Baltimore, SanDiego, Austin, Memphis

Anna: Washington, Oklahoma, Nashville, Denver, Phenix, Tucson

...

依此类推123个名字,每人最多50个城市。我想将文件读入R中的可用表,例如,一个包含123行和51列(最多名称+50个城市)的表。理想情况是表格中没有城市的空格(例如,对应于仅在美国两个城市的人的行将有48个空格。)

另一个更有用的选项也是两列表(或矩阵),其中有两列,形式为

Name City
Jhon NewYork
Jhon Boston
Jhon gainesville
Mike LosAngeles
...

2 个答案:

答案 0 :(得分:1)

我不太确定是否有可用的功能。但是为这个文件编写一个导入器并不是很难:

ll <- readLines("input.txt")

## keep only lines with "name: cities"
ll <- ll[grep(":", ll)]

## split at ":" to divide in name and cities
s <- strsplit(ll, ":")

## split by "," to divide cities
s <- lapply(s, function(x) {
  return(cbind(x[1], strsplit(x[2], ",")[[1]]))
})

## bind list of matrices to one matrix
m <- do.call(rbind, s)

## remove whitespace in front of the cities
m[, 2] <- gsub("^\\s+", "", m[, 2])
m

#      [,1]       [,2]
# [1,] "Jhon"     "NewYork"
# [2,] "Jhon"     "Boston"
# [3,] "Jhon"     "gainesville"
# [4,] "Mike"     "LosAngeles"
# [5,] "Almudena" "Baltimore"
# [6,] "Almudena" "SanDiego"
# [7,] "Almudena" "Austin"
# [8,] "Almudena" "Memphis"
# [9,] "Anna"     "Washington"
#[10,] "Anna"     "Oklahoma"
#[11,] "Anna"     "Nashville"
#[12,] "Anna"     "Denver"
#[13,] "Anna"     "Phenix"
#[14,] "Anna"     "Tucson"

答案 1 :(得分:0)

我今天正在研究此问题,并在搜索中找到了这个旧主题。以下是我如何使用tidyverse进行处理的方法。

以制表符分隔的:const {squadName, homeTown, formed, secretBase, active} = data

以逗号分隔:readLines("input.txt") %>% read_tsv