根据字符位置在R中拆分字符串

时间:2016-11-20 12:03:10

标签: r strsplit

我正在尝试将R(列条目)中的这些字符串拆分为三个单独的列:

João Moutinho Monaco, 30,  M(C) 
Clinton N'Jie Marseille, 23,  FW
Frederic Sammaritano Dijon, 30,  AM(LR)

成为

Player                Team           Pos
João Moutinho         Monaco         30,  M(C) 
Clinton N'Jie         Marseille      23,  FW
Frederic Sammaritano  Dijon          30,  AM(LR)

我可以使用gregexpr和nchar找到字符的位置,但是我不知道如何使用strsplit。或者也许另一个包更容易?

2 个答案:

答案 0 :(得分:2)

使用read.csv

创建分隔符后,我们可以使用gsub读取read.csv(text=gsub("^(\\S+\\s+\\S+)\\s+(\\S+),\\s+(.*)", "\\1;\\2;\\3", v1), sep=";", header=FALSE, col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE) # Player Team Pos #1 João Moutinho Monaco 30, M(C) #2 Clinton N'Jie Marseille 23, FW #3 Frederic Sammaritano Dijon 30, AM(LR) 中的向量
read.csv(text= sub("(\\s+[A-Za-z]+),(\\s+\\d+),(.*)", ";\\1;\\2\\3", v2), 
      header=FALSE, sep=";", col.names = c("Player", "Team", "Pos"), stringsAsFactors=FALSE)
#                Player       Team         Pos
#1        João Moutinho     Monaco    30  M(C)
#2        Clinton N'Jie  Marseille      23  FW
#3 Frederic Sammaritano      Dijon  30  AM(LR)
#4       Angel Di María        PSG   28 M(CLR)
#5    Jean Michael Seri       Nice     25 M(C)

更新

如果我们有更多的模式和"团队"名称只有一个单词(即在第一个'之前),

v1 <- c("João Moutinho Monaco, 30,  M(C)", "Clinton N'Jie Marseille, 23,  FW", 
                    "Frederic Sammaritano Dijon, 30,  AM(LR)")
v2 <- c(v1, "Angel Di María PSG, 28, M(CLR)","Jean Michael Seri Nice, 25, M(C)")

数据

#include <iostream>
#include <algorithm>

using namespace std;

int main()
{
    unsigned char ptr[] = {0x76, 0xD5, 0x8B, 0x3F};
    reverse(ptr, ptr + 4);
    float f = *reinterpret_cast<float*>(ptr);

    cout << f << endl;

    return 0;
}

答案 1 :(得分:1)

来自scipy.ndimage.filters.median_filter: %time a_mf = mf(a, size = 2) CPU times: user 1min 47s, sys: 684 ms, total: 1min 48s Wall time: 1min 48s %time a_mf = mf(a, size = 3) CPU times: user 6min 25s, sys: 1.79 s, total: 6min 27s Wall time: 6min 28s 的{​​{1}}方法,

word