使用R将字符串转换为data.frame

时间:2015-09-05 18:42:03

标签: r

我有超过1000行的字符串,我从Excel工作表的列中提取。以下是数据的外观(3行):

鸡(31%);鸭(16%);野鸭(14%);土耳其(10%);鸽子(4%);鹅(4%);野鸟(4%);树麻雀( 2%)

树麻雀(2%)

鸡(1%)

我需要将数据放入表中(对于此示例:8列x 3行)。有人可以帮忙吗?

x <- c("Chicken(31%);Duck(16%);Wild duck(14%);Turkey(10%);Pigeon(4%);Goose(4%);Wild bird(4%);Tree sparrow(2%)", 
"Tree sparrow(2%)", "Chicken(1%)")

2 个答案:

答案 0 :(得分:2)

最有可能更简洁的方法,但你可以尝试这样的事情:

library(stringi)
library(data.table)

# Drop empty lines if any
txt <- Filter(function(x) !stri_isempty(stri_trim(x)),  x)
# Extract matches
matches <- stri_match_all_regex(txt, "([\\w\\s]+)\\(([1-9]+)%\\);?")

matches[[1]]

##      [,1]               [,2]           [,3]
## [1,] "Chicken(31%);"    "Chicken"      "31"
## [2,] "Duck(16%);"       "Duck"         "16"
## [3,] "Wild duck(14%);"  "Wild duck"    "14"
## [4,] "Pigeon(4%);"      "Pigeon"       "4" 
## [5,] "Goose(4%);"       "Goose"        "4" 
## [6,] "Wild bird(4%);"   "Wild bird"    "4" 
## [7,] "Tree sparrow(2%)" "Tree sparrow" "2" 

# Rearrange
rows <- lapply(
   matches,
   function(x) setNames(as.list(as.numeric(x[, 3])), x[, 2]))

rbindlist(rows, fill=TRUE)

##    Chicken Duck Wild duck Pigeon Goose Wild bird Tree sparrow
## 1:      31   16        14      4     4         4            2
## 2:      NA   NA        NA     NA    NA        NA            2
## 3:       1   NA        NA     NA    NA        NA           NA

正则表达式解释

([\\w\\s]+) #  At least one word character or whitespace *, 1st group
\\( # Left parenthesis
([1-9]+) # At least one digit. You can replace + with {1,2}, 2nd group
% # Percent sign
\\) # Right parenthesis
;? # Optional semicolon 

*可能是\\w[\\w\\s]+

答案 1 :(得分:1)

这里有可能的解决方案:

query = $mysqli->prepare("CREATE TABLE $tbname (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY)") or trigger_error($mysqli->error."[$query]");