在数据集中,我只有一列是不整齐的,而Column1包含如此多的行以日期开头。示例如下:
Column1
date: 28-Oct-2017
company: BB KISS
classification: Software
roundsize: 1.2
cumulative: 1.2
round: Seed
investors: Private
headquartered: Darmstadt
country: Germany
region: DACH
description: Software development for crypto currency and blockchain
url: https://bbkiss.de/
要在“:”之后提取
df$extract <- sub('.*:', '', df$Column1)
我想将日期,公司,分类以及相对其他的内容分配给新列。如下所示:
date company classification roundsize cumulative round ...
28-Oct-2017 BB KISS Software 1.2 1.2 Seed ...
该怎么做?
答案 0 :(得分:1)
您可以将其与{tidyr}分开传播:
tab <- tibble::tribble(
~ column1,
"date: 28-Oct-2017",
"company: BB KISS",
"classification: Software",
"roundsize: 1.2",
"cumulative: 1.2"
)
library(tidyr)
tab %>%
separate(column1, into = c("A", "B"), sep = ": ") %>%
spread(key = A, value = B)
#> # A tibble: 1 x 5
#> classification company cumulative date roundsize
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Software BB KISS 1.2 28-Oct-2017 1.2
答案 1 :(得分:0)
我创建了一个由2个(相同)公司组成的示例数据集。您可以使用tidyr和dplyr使所有功能正常工作。您需要创建一个ID,以确保传播有效。
library(tidyr)
library(dplyr)
df_new <- df1 %>%
separate(Column1, into = c("cols", "data"), sep = ": ") %>%
group_by(cols) %>%
mutate(id = row_number()) %>% # create id per company
spread(cols, data)
df_new
# A tibble: 2 x 13
id classification company country cumulative date description headquartered investors region round roundsize url
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Software BB KISS Germany 1.2 28-Oct-2017 "Software developmen~ Darmstadt Private DACH Seed 1.2 https:~
2 2 Software BB KISS Germany 1.2 28-Oct-2017 "Software developmen~ Darmstadt Private DACH Seed 1.2 https:~
数据:
df1 <- dput(df1)
structure(list(Column1 = c("date: 28-Oct-2017", "company: BB KISS",
"classification: Software", "roundsize: 1.2", "cumulative: 1.2",
"round: Seed", "investors: Private", "headquartered: Darmstadt",
"country: Germany", "region: DACH", "description: Software development for crypto currency and blockchain ",
"url: https://bbkiss.de/", "date: 28-Oct-2017", "company: BB KISS",
"classification: Software", "roundsize: 1.2", "cumulative: 1.2",
"round: Seed", "investors: Private", "headquartered: Darmstadt",
"country: Germany", "region: DACH", "description: Software development for crypto currency and blockchain ",
"url: https://bbkiss.de/")), class = "data.frame", row.names = c(NA,
-24L))