Question

我正在以以下方式使用来自数据库的数据框：

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""

因此，有两列，即“用户名”和“元素”。因此，用户可以在一个事务中使用一个或多个元素。当有多个元素时，它们在事务中用逗号分隔。我需要将元素分开，每行一个，但仍用用户名标记。最后，我希望它像这样：

username    elements
username1   """interfaces"".""dual()"""
username1   """interfaces"".""f_capitalaccrualcurrentyear"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username2   """interfaces"".""dnow_completion""
username2   ""interfaces"".""dnow_s_daily_prod_ta"""
username4   """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3   """interfaces"".""dnow_completion""
username3   ""interfaces"".""dnow_s_daily_prod_ta"""

我一直在尝试遍历数据框，拆分包含逗号的元素，然后将它们与相应的用户名放在一起。

我一直在尝试下面的代码，但是效率极低。我是“ R”的陌生人，所以我猜想必须有一种更有效的方法来做到这一点。

interface.data <-data.frame(
    username = c(),
    elements = c()
)
for (row in 1:nrow(input)) { ##input is the frame that comes from the database
     myrowbrk<-input[row,"elements"]
     myrowelements<-chartr(",", "\n", myrowbrk)      
     user<-input[row,"username"]
     interface.newdata <- data.frame(
         username = user,
         elements = c(myrowelements)         
     )
     interface.final<- rbind(interface.data,interface.newdata )
}

output<-interface.final

Answer 1

您可以使用tidyr软件包来做到这一点。我的解决方案使用两个步骤来获取所需格式的数据：1）使用逗号分隔elements列，并2）将格式从宽变长。

library(tidyr)

#Separate the 'elements' column from your 'df' data frame using the comma character
#Set the new variable names as a sequence of 1 to the max number of expected columns
df2 <- separate(data = df, 
                   col = elements, 
                   into = as.character(seq(1,2,1)),
                   sep = ",")
#This code gives a warning because not every row has a string with a comma. 
#Empty entries are filled with NA

#Then change from wide to long format, dropping NA entries
#Drop the column that indicates the name of the column from which the elements entry was obtained (i.e., 1 or 2)
df2 <- df2 %>%
  pivot_longer(cols = "1":"2",
               values_to = "elements",
               values_drop_na = TRUE) %>%
  select(-name)

遍历数据并创建新的数据框

1 个答案: