在Strsplit之后,输出不是预期的格式

时间:2013-07-08 10:36:28

标签: r output strsplit

我的输入文件名为" locaddr"有以下记录:

"Shelbourne Road, Dublin, Ireland"                                     
"1 Hatch Street Upper, Dublin, Ireland"                               
"98 Haddington Road, Dublin, Ireland"       
"11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland"
"Winterstraße 17, 69190 Walldorf, Germany"

我使用以下代码将R中的STRSPLIT函数应用于此文件:

*testmat <- strsplit(locaddr,split=",")
outmat <- matrix(unlist(testmat), nrow=nrow(locaddr), ncol=3, byrow=T)*

我得到的最终结果是:

Street                        City                    Country          
 [1,] "Shelbourne Road"             " Dublin"               " Ireland"       
 [2,] "1 Hatch Street Upper"        " Dublin"               " Ireland"       
 [3,] "98 Haddington Road"          " Dublin"               " Ireland"       
 [4,] "11 Mount Argus Close"        " Harold's Cross"       " Dublin 6W"     
 [5,] " Co. Dublin"                 " Ireland"              "Winterstraße 17"
 [6,] " 69190 Walldorf"             " Germany"              "Caughley Road"  
 [7,] " Broseley"                   " Shropshire TF12 5AT"  " UK"            
 [8,] "Pappelweg 30"                " 48499 Salzbergen"     " Germany"       
 [9,] "60 Grand Canal Street Upper" " Dublin 4"             " Ireland"       
[10,] "Wieslocher Straße"           " 68789 Sankt Leon-Rot" " Germany"

从上面可以明显看出,所需的输出是每条记录中的最后三个术语。但相反,我在那里几乎所有东西都混合在一起。

我的要求是虽然地址都是可变长度的,但在STRSPLIT之后,我只需要选择最后三个术语并将它们作为Street,City Country放入。

非常感谢您的帮助和时间。

2 个答案:

答案 0 :(得分:2)

下次请提供一些方便的可重现代码。

以下是我将如何尝试解决此问题的代码。

x <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")

# split on ,
splitx <- strsplit(x, ",")

# for every list element (lapply climbs the list element-wise)
# subset last 3 elements
last3 <- lapply(splitx, tail, n = 3)

# merge them together by row
do.call("rbind", last3)

     [,1]                   [,2]              [,3]      
[1,] "Shelbourne Road"      " Dublin"         " Ireland"
[2,] "1 Hatch Street Upper" " Dublin"         " Ireland"
[3,] "98 Haddington Road"   " Dublin"         " Ireland"
[4,] " Dublin 6W"           " Co. Dublin"     " Ireland"
[5,] "Winterstraße 17"      " 69190 Walldorf" " Germany"

答案 1 :(得分:2)

这基本上是Roman的答案的变体,但意味着将(可能的)多个地址组合在一起。它假设最后两个以逗号分隔的值是城市和国家,然后汇集前面的元素。

# read data
y <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")
# split and output
result <- lapply(y, function(x) {
    splitx <- strsplit(x, ", ")[[1]]
    rowtail <- tail(splitx, n = 2)
    if(length(splitx)>3)
        multi <- paste(splitx[1:(length(splitx)-2)],collapse=", ")
    else
        multi <- splitx[1]
    return(c(multi,rowtail))
    })
# rbind back together
do.call(rbind,result)

这会产生:

     [,1]                                              [,2]             [,3]     
[1,] "Shelbourne Road"                                 "Dublin"         "Ireland"
[2,] "1 Hatch Street Upper"                            "Dublin"         "Ireland"
[3,] "98 Haddington Road"                              "Dublin"         "Ireland"
[4,] "11 Mount Argus Close, Harold's Cross, Dublin 6W" "Co. Dublin"     "Ireland"
[5,] "Winterstraße 17"                                 "69190 Walldorf" "Germany"