解析然后检查R中是否还有更多内容

时间:2018-08-03 15:43:33

标签: r parsing

    Name             Parsed      Rank
WBA-Y*08:03:01    WBA-Y*08:03      1
WBA-Y*08:169      WBA-Y*08:169     2
WBA-Y*08:03:15    WBA-Y*08:03      3
WBA-Y*08:03:02    WBA-Y*08:03      4

这是我数据框中的三个特定列。我已经解析了名称列,并根据其他值对其进行了排序/排序。我现在正在尝试解析并获得具有第三个数字和第三个冒号的那些,然后将其余部分移到底部。

这是此示例的预期输出:

    Name             Parsed      Rank
WBA-Y*08:03:01    WBA-Y*08:03      1
WBA-Y*08:03:15    WBA-Y*08:03      2
WBA-Y*08:03:02    WBA-Y*08:03      3
WBA-Y*08:169      WBA-Y*08:169     4

由于WBA-Y * 08:169只有两个数字和一个冒号,因此它将移至底部。我该怎么做呢?我可以使用gsub或sub吗?

2 个答案:

答案 0 :(得分:2)

df = read.table(text = "
Name             Parsed      Rank
WBA-Y*08:03:01    WBA-Y*08:03      1
WBA-Y*08:169      WBA-Y*08:169     2
WBA-Y*08:03:15    WBA-Y*08:03      3
WBA-Y*08:03:02    WBA-Y*08:03      4
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  mutate(v_Name = str_count(Name, ":")) %>%  # count how many : you have for each Name value
  arrange(desc(v_Name)) %>%                  # arrange descending by those counts
  mutate(Rank = row_number())                # update rank to be the row number

#             Name       Parsed Rank v_Name
# 1 WBA-Y*08:03:01  WBA-Y*08:03    1      2
# 2 WBA-Y*08:03:15  WBA-Y*08:03    2      2
# 3 WBA-Y*08:03:02  WBA-Y*08:03    3      2
# 4   WBA-Y*08:169 WBA-Y*08:169    4      1

如果需要,可以通过在末尾添加v_Name来删除%>% select(-v_Name)

答案 1 :(得分:1)

这里是一个set => AppSettings.AddOrUpdateValue(MySettingKey, value); 选项来修复它。使用base R来检查以下模式:在“名称”列的字符串的末尾(grepl)中,两位数字后跟:两次,后跟一个或多个数字,请使用$的第一列和第二列并更新这些列

order

数据

df[1:2] <- df[order(-grepl("(\\d{2}:){2}\\d+$", df$Name)), 1:2]
df
#            Name       Parsed Rank
#1 WBA-Y*08:03:01  WBA-Y*08:03    1
#2 WBA-Y*08:03:15  WBA-Y*08:03    2
#3 WBA-Y*08:03:02  WBA-Y*08:03    3
#4   WBA-Y*08:169 WBA-Y*08:169    4