Question

在数据集中，我只有一列是不整齐的，而Column1包含如此多的行以日期开头。示例如下：

Column1
date: 28-Oct-2017
company: BB KISS
classification: Software
roundsize: 1.2
cumulative: 1.2
round: Seed
investors: Private
headquartered: Darmstadt
country: Germany
region: DACH
description: Software development for crypto currency and blockchain 
url: https://bbkiss.de/

要在“：”之后提取

df$extract <- sub('.*:', '', df$Column1)

我想将日期，公司，分类以及相对其他的内容分配给新列。如下所示：

date          company  classification  roundsize  cumulative  round ...
28-Oct-2017   BB KISS  Software        1.2        1.2         Seed  ...

该怎么做？

Answer 1

您可以将其与{tidyr}分开传播：

tab <- tibble::tribble(
  ~ column1, 
  "date: 28-Oct-2017",
  "company: BB KISS",
  "classification: Software",
  "roundsize: 1.2",
  "cumulative: 1.2"
)
library(tidyr)
tab %>% 
  separate(column1, into = c("A", "B"), sep = ": ") %>%
  spread(key = A, value = B)
#> # A tibble: 1 x 5
#>   classification company cumulative date        roundsize
#>   <chr>          <chr>   <chr>      <chr>       <chr>    
#> 1 Software       BB KISS 1.2        28-Oct-2017 1.2

Answer 2

我创建了一个由2个（相同）公司组成的示例数据集。您可以使用tidyr和dplyr使所有功能正常工作。您需要创建一个ID，以确保传播有效。

library(tidyr)
library(dplyr)

df_new <- df1 %>% 
  separate(Column1, into = c("cols", "data"), sep = ": ") %>% 
  group_by(cols) %>%
  mutate(id = row_number()) %>% # create id per company
  spread(cols, data)

df_new
# A tibble: 2 x 13
     id classification company country cumulative date        description           headquartered investors region round roundsize url    
  <int> <chr>          <chr>   <chr>   <chr>      <chr>       <chr>                 <chr>         <chr>     <chr>  <chr> <chr>     <chr>  
1     1 Software       BB KISS Germany 1.2        28-Oct-2017 "Software developmen~ Darmstadt     Private   DACH   Seed  1.2       https:~
2     2 Software       BB KISS Germany 1.2        28-Oct-2017 "Software developmen~ Darmstadt     Private   DACH   Seed  1.2       https:~

数据：

df1 <-  dput(df1)
structure(list(Column1 = c("date: 28-Oct-2017", "company: BB KISS", 
"classification: Software", "roundsize: 1.2", "cumulative: 1.2", 
"round: Seed", "investors: Private", "headquartered: Darmstadt", 
"country: Germany", "region: DACH", "description: Software development for crypto currency and blockchain ", 
"url: https://bbkiss.de/", "date: 28-Oct-2017", "company: BB KISS", 
"classification: Software", "roundsize: 1.2", "cumulative: 1.2", 
"round: Seed", "investors: Private", "headquartered: Darmstadt", 
"country: Germany", "region: DACH", "description: Software development for crypto currency and blockchain ", 
"url: https://bbkiss.de/")), class = "data.frame", row.names = c(NA, 
-24L))

提取迭代字符串并将其分配给列

2 个答案: