我设法以这种形式(在R中)获取我的原始数据,即每个产品组合(3的组合)它们各自的计数,但是正如您所看到的那样,存在重复发生,存在具有相同产品但不同的行对这些产品的排序,我需要一种方法来组合这些行而不管顺序,并添加它们的总和(num)来得到合并的总和。 这只是整个数据集的一部分。帮我找出一种方法。
pages sum(num)
Badezimmer,Baumarkt,Büromöbel 6
Badezimmer,Baumarkt,Dekoration 14
Badezimmer,Baumarkt,Flur 30
Badezimmer,Baumarkt,Garten 18
Badezimmer,Baumarkt,Heimtextilien 100
Badezimmer,Baumarkt,Kinder 28
Badezimmer,Büromöbel,Baumarkt 16
Badezimmer,Flur,Baumarkt 40
答案 0 :(得分:3)
这是一种可能性:
df1$pages <- as.character(df1$pages) # prevent use of factors
df1$pages <- sapply(sapply(df1$pages,function(x) strsplit(x,",")),function(x) paste(sort(unlist(x)),collapse=',')) #split at commas, order words alphabetically, and restore the description
df1 <- aggregate(sum.num. ~ ., df1, sum) #sum over identical 'pages'
# pages sum.num.
#1 Badezimmer,Baumarkt,Büromöbel 22
#2 Badezimmer,Baumarkt,Dekoration 14
#3 Badezimmer,Baumarkt,Flur 70
#4 Badezimmer,Baumarkt,Garten 18
#5 Badezimmer,Baumarkt,Heimtextilien 100
#6 Badezimmer,Baumarkt,Kinder 28
数据:强>
df1 <- structure(list(pages = structure(1:8,
.Label = c("Badezimmer,Baumarkt,Büromöbel",
"Badezimmer,Baumarkt,Dekoration", "Badezimmer,Baumarkt,Flur",
"Badezimmer,Baumarkt,Garten", "Badezimmer,Baumarkt,Heimtextilien",
"Badezimmer,Baumarkt,Kinder", "Badezimmer,Büromöbel,Baumarkt",
"Badezimmer,Flur,Baumarkt"), class = "factor"),
sum.num. = c(6L, 14L, 30L, 18L, 100L, 28L, 16L, 40L)),
.Names = c("pages", "sum.num."), class = "data.frame",
row.names = c(NA, -8L))
答案 1 :(得分:3)
以下是# Import Packages
import random
# Global Variables
perf_num = 500
species = [20]
temp_num = 0
length = 0
s = 0
# Main Program
for num in range(100):
r1 = int(random.random()*10)
r2 = int(random.random()*10)
species.append(r1)
length = len(species)
while s < length:
print(s)
if species[s-1] > species[s]:
temp_num = species[s-1] - r1
species[s-1] = temp_num
else:
temp_num = species[s] - r1
species[s] = temp_num
if s-1 < 5:
species[s-1] = []
s += 1
print(species)
使用cSplit
的选项。我们转换了&#39; data.frame&#39;到&#39; data.table&#39;,创建一个行ID列&#39;&#39;选项library(splitstackshape)
,keep.rownames=TRUE
&#39;页面&#39;专栏&#39;,&#39;并转换为&#39; long&#39;格式为split
。通过&#39;,cSplit
&#39;&#39;&#39;和sort
它们在一起,我们也得到了&#39; sum.num的第一个值。&#39;。然后,我们得到&#39; sum.num的paste
。&#39;按&#39;页&#39;分组。
sum
注:&#39; df1&#39;来自@ RHertel的帖子。