尝试在变量中有多个空格的R数据框中拆分列,但我想在第一个空格上拆分。示例数据框:
template<auto value>
struct MyStruct {};
template<typename Class, typename Result, Result Class::* value>
struct MyStruct<value> {
// add members using Class, Result, and value here
using containing_type = Class;
};
typename MyStruct<&Something::theotherthing>::containing_type x = Something();
我正在尝试使用tidyr在第一个空格中拆分df'date'列,以便日期在它自己的列中:
df <- data.frame(game = c(1, 2, 3, 4, 5, 6), date = c("Monday Apr 3", "Tuesday Apr 4", "Wednesday Apr 5", "Thursday Apr 6", "Friday Apr 7", "Saturday Apr 8"))
以上是问题所在。以下是我尝试过的以及出了什么问题。
通过tidyr文档,'sep'的默认值是'一个匹配任何非字母数字值序列的正则表达式。'所以,如果我这样做:
game day date
1 1 Monday Apr 3
2 2 Tuesday Apr 4
3 3 Wednesday Apr 5
4 4 Thursday Apr 6
5 5 Friday Apr 7
6 6 Saturday Apr 8
那将在空间上分裂,但它会在两个空格上分开(例如'星期一'之后的空格和'星期一4月3''4月'之后的空格)。结果是:
df %>% separate(date, c("day", "date"))
我可以添加正则表达式来选择第一个空格(我检查了这个正则表达式在Sublime Text中工作):
game day date
1 1 Monday Apr
2 2 Tuesday Apr
3 3 Wednesday Apr
4 4 Thursday Apr
5 5 Friday Apr
6 6 Saturday Apr
Warning message:
Too many values at 6 locations: 1, 2, 3, 4, 5, 6
但这给了我:
df %>% separate(date, c("day", "date"), sep='^[^\\s]*\\K\\s')
那么出了什么问题?或者我如何使这项工作?或者我明白不明白的是什么?
答案 0 :(得分:9)
您需要将extra
参数指定为merge
:
library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
答案 1 :(得分:1)
Psidom为您提供有关太多值的第一条警告信息。关于您的第二种方法,您最终得到的值太少,部分原因是\\K
不能与stringi
一起使用,separate
正在使用stringi::stri_split_regex(df$date, '^[^\\s]*\\K\\s')
。您可以使用sep
自行查看。因此,您不会使用该正则表达式进行任何拆分,并且最终会得到关于值太少的警告消息。
您可以将# a space not followed by a digit
df %>% separate(date, c("day", "date"), sep = "\\s(?!\\d)")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
指定为
\\K
你不能使用# a space preceded by 3 - 6 characters and "day".
# 3 - 6 characters allows "Monday" and "Wednesday"
"(?<=.{3,6}day)\\s"
# same idea
"(?<=\\S{3,6}day)\\s"
# same idea
"(?<=.?.?.?...day)\\s"
# same idea, but using ^ to anchor and not using "day"
"(?<=^\\S{0,9})\\s"
# space followed by some other characters, a space, digit(s) and the end of the line
"\\s(?=.+\\s\\d+$)"
,但如果你需要使用可变长度的后视,量词需要有界限:
{{1}}
答案 2 :(得分:1)
我们可以使用base R
cbind(df[1], read.csv(text=sub("\\s+", ",", df$date),
header=FALSE, col.names = c("day", "date")))
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
或其他选项extract
来自tidyr
library(tidyr)
extract(df, date, into = c("day", "date"), "(\\S+)\\s+(.*)")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8