我正在以以下方式使用来自数据库的数据框:
username elements
username1 """interfaces"".""dual()"""
username1 """interfaces"".""f_capitalaccrualcurrentyear"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
username4 """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3 """interfaces"".""dnow_completion"",""interfaces"".""dnow_s_daily_prod_ta"""
因此,有两列,即“用户名”和“元素”。因此,用户可以在一个事务中使用一个或多个元素。当有多个元素时,它们在事务中用逗号分隔。我需要将元素分开,每行一个,但仍用用户名标记。最后,我希望它像这样:
username elements
username1 """interfaces"".""dual()"""
username1 """interfaces"".""f_capitalaccrualcurrentyear"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username2 """interfaces"".""dnow_completion""
username2 ""interfaces"".""dnow_s_daily_prod_ta"""
username4 """interfaces"".""dnow_s_downtime_stat_with_lat_long"""
username3 """interfaces"".""dnow_completion""
username3 ""interfaces"".""dnow_s_daily_prod_ta"""
我一直在尝试遍历数据框,拆分包含逗号的元素,然后将它们与相应的用户名放在一起。
我一直在尝试下面的代码,但是效率极低。我是“ R”的陌生人,所以我猜想必须有一种更有效的方法来做到这一点。
interface.data <-data.frame(
username = c(),
elements = c()
)
for (row in 1:nrow(input)) { ##input is the frame that comes from the database
myrowbrk<-input[row,"elements"]
myrowelements<-chartr(",", "\n", myrowbrk)
user<-input[row,"username"]
interface.newdata <- data.frame(
username = user,
elements = c(myrowelements)
)
interface.final<- rbind(interface.data,interface.newdata )
}
output<-interface.final
答案 0 :(得分:1)
您可以使用tidyr
软件包来做到这一点。我的解决方案使用两个步骤来获取所需格式的数据:1)使用逗号分隔elements
列,并2)将格式从宽变长。
library(tidyr)
#Separate the 'elements' column from your 'df' data frame using the comma character
#Set the new variable names as a sequence of 1 to the max number of expected columns
df2 <- separate(data = df,
col = elements,
into = as.character(seq(1,2,1)),
sep = ",")
#This code gives a warning because not every row has a string with a comma.
#Empty entries are filled with NA
#Then change from wide to long format, dropping NA entries
#Drop the column that indicates the name of the column from which the elements entry was obtained (i.e., 1 or 2)
df2 <- df2 %>%
pivot_longer(cols = "1":"2",
values_to = "elements",
values_drop_na = TRUE) %>%
select(-name)