Question

我在R中有一个data.frame，为简单起见，我想要分隔一列。它看起来像这样：

V1
Value_is_the_best_one
This_is_the_prettiest_thing_I've_ever_seen
Here_is_the_next_example_of_what_I_want

我的真实数据非常大（数百万行），所以我想使用tidyr的独立函数（因为它的速度非常快）来分离出前几个实例。我希望结果如下：

V1       V2     V3     V4 
Value    is     the    best_one
This     is     the    prettiest_thing_I've_ever_seen
Here     is     the    next_example_of_what_I_want

如您所见，分隔符为_，V4列可以具有不同数量的分隔符。我想保留V4（不要丢弃它），但不必担心那里有多少东西。总会有四列（即我的行中没有一行只有V1-V3）。

这是我一直在使用的起始tidyr命令：

separate(df, V1, c("V1", "V2", "V3", "V4"), sep="_")

这摆脱了V4（并吐出警告，这不是最大的交易）。

Answer 1

您需要extra参数和"merge"选项。这样只允许与定义新列一样多的拆分。

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge")

     V1 V2  V3                             V4
1 Value is the                       best_one
2  This is the prettiest_thing_I've_ever_seen
3  Here is the    next_example_of_what_I_want

Answer 2

以下是/etc/resolv.conf

的另一个选项

extract

另一个选项是来自library(tidyr) extract(df1, V1, into = paste0("V", 1:4), "([^_]+)_([^_]+)_([^_]+)_(.*)") # V1 V2 V3 V4 # 1 Value is the best_one # 2 This is the prettiest_thing_I've_ever_seen # 3 Here is the next_example_of_what_I_want的{{1}}我们可以指定分割数

stri_split

tidyr只分开前n个实例

2 个答案: