我有一个字符数组,如下所示。
char[1:3] .
[1] "ByShubham SharmaTOP 500 REVIEWERon 12 November 2017"
[2] "ByJitender Bhatiaon 16 November 2017"
[3] "ByMridul K.on 15 August 2017"
我希望输出为
Name Badge Date
---------------
Shubham Sharma . TOP 500 REVIEWER 12 November 2017
Jitender Bhatia NA 16 November 2017
Mridul K15 NA . August 2017
答案 0 :(得分:3)
假设:
By
on
前面的非小写字符应删除我们插入分隔字段的分号并删除By
和on
,最后使用分号分隔符read.table
读取它。
ss0 <- sub("By(.*) (\\d+ \\S+ \\d{4})$", "\\1;\\2", s) # insert ; before date
ss1 <- sub("([^a-z])on", "\\1 ", ss0) # remove 'on' if not after lower
ss2 <- sub("^(.*[a-z])([A-Z].*)", "\\1;\\2", ss1) # insert ; between lower & upper
ss3 <- sub("^([^;]*);([^;]*)$", "\\1;NA;\\2", ss2) # ; to ;NA; if only 2 fields
read.table(text = ss3, sep = ";", as.is = TRUE, strip.white = TRUE,
col.names = c("Name", "Badge", "Date"))
,并提供:
Name Badge Date
1 Shubham Sharma TOP 500 REVIEWER 12 November 2017
2 Jitender Bhatiaon <NA> 16 November 2017
3 Mridul K. <NA> 15 August 2017
这适用于示例代码,但您可能必须根据整个数据集的整个规则对其进行修改。
如果输入很短,另一种可能性是手动编辑输入,删除By
和on
并在字段之间适当插入分号。然后使用上面的read.table
语句。
s <- c( "ByShubham SharmaTOP 500 REVIEWERon 12 November 2017",
"ByJitender Bhatiaon 16 November 2017",
"ByMridul K.on 15 August 2017")
更新:(1)略微修改规则并相应地编码。 (2)简化。
答案 1 :(得分:1)
只有三行,我宁愿从头开始编写数据框:
df <- data.frame(Name = c("Shubham Sharma", "Jitender Bhatia", "Mridul K."),
Badge = c("TOP 500 REVIEWER", NA, NA),
Date = c(as.Date("2017-11-12"), as.Date("2017-11-16"), as.Date("2017-08-15")),
stringsAsFactors = F))
> df
Name Badge Date
1 Shubham Sharma TOP 500 REVIEWER 2017-11-12
2 Jitender Bhatia <NA> 2017-11-16
3 Mridul K. <NA> 2017-08-15