我有这个数据框
df1 <- data.frame(Note = c("Profit before tax 240 tSEK",
"Earnings per share 0.240 " ,
"Ali de Margin 37 %"),
Line = c(6, 2, 2))
我想要下面的内容
Note Val Unit Line
Profit before tax 240 tSEK 6
Earnings per share 0.240 2
Ali de Margin 37 % 2
我该怎么做?
答案 0 :(得分:3)
您可以使用data.table函数tstrsplit
,在数字之前或带有数字的数字(带或不带点)之后在空格上拆分变量Note
,使用正则表达式和外观:
library(data.table)
setDT(df1)[, c("Note", "Val", "Unit"):=tstrsplit(Note, "( (?=[0-9.]+))|((?<=\\d) )", perl=TRUE)]
df1
# Note Line Val Unit
#1: Profit before tax 6 240 tSEK
#2: Earnings per share 2 0.240 NA
#3: Ali de Margin 2 37 %
答案 1 :(得分:1)
你也可以玩regexpr
&amp; regmatches
函数:
pattern <- regexpr("[[:digit:]]+\\.*[[:digit:]]+", df$note)
note <- substr(df$note, 1, pattern-2)
value <- regmatches(df$note, pattern)
unit <- substr(df$note,
pattern+attr(pattern, "match.length")+1,
nchar(as.character(df$note)))
result <- data.frame(note=note, value=value, unit=unit, line=df$Lines)
# note value unit line
#1 Profit before tax 240 tSEK 6
#2 Earnings per share 0.240 2
#3 Ali de Margin 37 % 2
答案 2 :(得分:0)
一种解决方案是使用tidyr::extract
。 extract
函数提供了定义regex
以捕获组并在多列中分隔列的灵活性。
library(tidyr)
extract(df1, Note, into = c("Note", "Val", "Unit"),
regex = "^([[:alpha:][:blank:]]+)\\s([[:digit:].]+)(.*)")
# Note Val Unit Line
# 1 Profit before tax 240 tSEK 6
# 2 Earnings per share 0.240 2
# 3 Ali de Margin 37 % 2
**Regex explanation:** ^([[:alpha:][:blank:]]+) -- Group 1 => Any number of character/spaces \\s -- Leave a space between Group 1 and Group 2 ([[:digit:].]+) -- Group 2 => Any number of digits/. (.*) -- Gropu 3 => Any thing after 2nd group till end.