使用R操纵数据集

时间:2017-12-05 11:11:04

标签: r

我有一个数据集,例如以下列条目

Data: 05.12.2017   |   Acceleration: 0,0042414... 

Data: 05.12.2017   |   Acceleration: 0,004235235... 

Data: 05.12.2017   |   Acceleration: 0,04235235...

Data: 05.12.2017   |   Acceleration: 0,0023414... 

我想操纵数据,以便“:”之前的名称是列的名称。

换句话说,我想要这个:

Data         |  Acceleration         

05.12.2017   |  0,0042414... 

05.12.2017   |  0,004235235... 

05.12.2017   |  0,04235235...

05.12.2017   |  0,0023414...

是否有可能这样做?

3 个答案:

答案 0 :(得分:2)

您可以设置数据集的新名称,然后从条目中删除这些字符串。虽然不是一般方法。

library(stringr)

names(your_data_set) = c("Data", "Acceleration")

your_data_set$Data = str_replace_all(your_data_set$Data, "Data: ", "")
your_data_set$Acceleration= str_replace_all(your_data_set$Acceleration, "Acceleration: ", "")

答案 1 :(得分:2)

使用tidyr包中的一些重塑的解决方案:

# example dataset
df = data.frame(x = c("Data: 05.12.2017", "Data: 05.12.2017"),
                y = c("Acceleration: 0.0042414", "Acceleration: 0.0042243"),
                stringsAsFactors = F)

df

#                  x                       y
# 1 Data: 05.12.2017 Acceleration: 0.0042414
# 2 Data: 05.12.2017 Acceleration: 0.0042243


library(dplyr)
library(tidyr)

df %>%
  gather() %>%                                
  select(value) %>%
  separate(value,c("v1","v2"), sep = ":") %>%
  group_by(v1) %>%
  mutate(row_num = row_number()) %>%
  spread(v1,v2) %>%
  select(-row_num)

# # A tibble: 2 x 2
#   Acceleration        Data
# *        <chr>       <chr>
# 1    0.0042414  05.12.2017
# 2    0.0042243  05.12.2017

答案 2 :(得分:1)

希望这有帮助!

#column header
names(df) <- sapply(df[1,], function(x) gsub(":.*","", x))
#column values
df <- sapply(df, function(x) trimws(gsub(".*:","", x)))
#now you can easily format columns as date and numeric


> #sample data
> dput(df)
structure(list(V1 = structure(c(2L, 1L, 1L, 1L), .Label = c("                 Data: 05.12.2017", 
"Data: 05.12.2017"), class = "factor"), V2 = structure(c(3L, 
2L, 4L, 1L), .Label = c(" Acceleration: 0,0023414", " Acceleration: 0,004235235", 
" Acceleration: 0,0042414", " Acceleration: 0,04235235"), class = "factor")), .Names = c("V1", 
"V2"), class = "data.frame", row.names = c(NA, -4L))