我正在尝试组合多个缺少变量的列。我是R的新手,在这方面遇到了很多麻烦。例如,我试图解决这个问题:
ID A B D
1 fill NA Market
2 fill Ball NA
3 NA NA Market
4 fill Ball NA
进入这个:
Namespace JP_WCF
<ServiceContract> _
Public Interface IJP_WCF
<OperationContract> _
<FaultContract(GetType(JP_WCF_Fault))> _
Sub UploadFile(request As JP_WCF_FileUpload)
<OperationContract> _
<FaultContract(GetType(JP_WCF_Fault))> _
Function fakeError(ByVal int1 As Integer, ByVal int2 As Integer) As Integer
<OperationContract> _
<FaultContract(GetType(JP_WCF_Fault))> _
Function Ping() As Date
End Interface
<MessageContract> _
Public Class JP_WCF_FileUpload
Implements IDisposable
<MessageHeader(MustUnderstand:=True)> _
Public FileName As String
<MessageHeader(MustUnderstand:=True)> _
Public Length As Long
<MessageBodyMember(Order:=1)> _
Public FileByteStream As System.IO.Stream
Public Sub Dispose() Implements IDisposable.Dispose
If FileByteStream IsNot Nothing Then
FileByteStream.Close()
FileByteStream = Nothing
End If
End Sub
End Class
<DataContract> _
Public Class JP_WCF_Fault
<DataMember> _
Public Property EventID() As Integer
<DataMember> _
Public Property Message() As String
<DataMember> _
Public Property Description() As String
Public Sub New(ByVal _EventID As Integer, ByVal _Message As String, ByVal _Description As String)
Me.EventID = _EventID
Me.Message = _Message
Me.Description = _Description
End Sub
End Class
End Namespace
我遇到了麻烦,因为数据库有大约1500列,有许多重复的列名。我尝试使用melt和groupby,但我无法让它工作。如果列重复,并且它具有值,则具有相同名称的列将没有值,如果这是有意义的。我不知道如何在不通过数据库的情况下使函数工作并识别50个左右的重复列(如25列中的重复列)。可能还有一些列具有三元组,如在三列A中,但从不重叠值。
答案 0 :(得分:1)
你可以尝试
# your data
d <- read.table(text="ID A B A D
1 fill NA NA Market
2 NA Ball fill NA
3 NA NA NA Market
4 fill Ball NA NA", header=T)
d
ID A B A.1 D
1 1 fill <NA> <NA> Market
2 2 <NA> Ball fill <NA>
3 3 <NA> <NA> <NA> Market
4 4 fill Ball <NA> <NA>
正如您所见,重复的姓氏标记为.n
在下文中,我们将改变名称,以便使用stringi再次复制它们,然后使用tidyverse进行传播:
library(tidyverse)
library(stringi)
d %>%
gather(key, value, -ID) %>%
mutate(key2=stri_extract_first_words(key)) %>%
filter(!is.na(value)) %>%
select(ID, key2, value) %>%
spread(key2, value)
ID A B D
1 1 fill <NA> Market
2 2 fill Ball <NA>
3 3 <NA> <NA> Market
4 4 fill Ball <NA>
答案 1 :(得分:0)
您可以使用基数R的split.default
根据相似的列名拆分data.frame,并合并每个子组的数据。如果重要的话,您可能需要执行额外的步骤以使列顺序正确
data.frame(lapply(split.default(df, names(df)),
function(x) x[cbind(1:NROW(x), max.col(!is.na(x)))]))
# A B D ID
#1 fill <NA> Market 1
#2 fill Ball <NA> 2
#3 <NA> <NA> Market 3
#4 fill Ball <NA> 4
数据强>
df = structure(list(ID = 1:4, A = structure(c(1L, NA, NA, 1L), .Label = "fill", class = "factor"),
B = structure(c(NA, 1L, NA, 1L), .Label = "Ball", class = "factor"),
A = structure(c(NA, 1L, NA, NA), .Label = "fill", class = "factor"),
D = structure(c(1L, NA, 1L, NA), .Label = "Market", class = "factor")), .Names = c("ID",
"A", "B", "A", "D"), class = "data.frame", row.names = c(NA,
-4L))