我正在尝试使用这样的数据框(抱歉格式化,我在手机上输入):
'Date' 'Color' 'Jar'
0 '05-10-2017' 'Red' 1
1 '05-10-2017' 'Green' 2
2 '05-10-2017' 'Blue' 1
3 '05-10-2017' 'Red' 2
4 '05-10-2017' 'Blue' 1
5 '05-11-2017' 'Red' 2
6 '05-11-2017' 'Green' 1
7 '05-11-2017' 'Red' 2
8 '05-11-2017' 'Green' 1
9 '05-11-2017' 'Blue' 1
10 '05-11-2017' 'Blue' 2
11 '05-11-2017' 'Red' 2
12 '05-11-2017' 'Blue' 2
13 '05-11-2017' 'Blue' 1
14 '05-12-2017' 'Green' 2
15 '05-12-2017' 'Blue' 1
16 '05-12-2017' 'Red' 1
17 '05-12-2017' 'Blue' 2
18 '05-12-2017' 'Blue' 2
并派生一个看起来像下面的列,其中列填充每个日期的实例数。
Date. Jar 1 Red Jar 2 Red Jar 1 Green Jar 2 Green Jar 1 Blue Jar 2 Blue
05-10-2017
05-11-2017
05-12-2017
我试图使用groupby来完成这个并且能够获得每天每种颜色的计数,但我不确定如何分割他们来自Jar的颜色列。我还读到了查询或者loc可能会打赌完成此任务的选项。任何方向将不胜感激。
答案 0 :(得分:1)
我们试试这个:
df_out = df.assign(count=1).pivot_table(index='Date',columns=['Jar','Color'], values='count',aggfunc='sum', fill_value=0)
df_out.columns = df_out.columns.map('{0[0]} {0[1]}'.format)
df_out.add_prefix('Jar ')
输出:
Jar 1 Blue Jar 1 Green Jar 1 Red Jar 2 Blue Jar 2 Green \
Date
05-10-2017 2 0 1 0 1
05-11-2017 2 2 0 2 0
05-12-2017 1 0 1 2 1
Jar 2 Red
Date
05-10-2017 1
05-11-2017 3
05-12-2017 0
答案 1 :(得分:1)
选项1
pd.crosstab
df1
Date Color Jar
0 05-10-2017 Red 1
1 05-10-2017 Green 2
2 05-10-2017 Blue 1
3 05-10-2017 Red 2
4 05-10-2017 Blue 1
5 05-11-2017 Red 2
6 05-11-2017 Green 1
7 05-11-2017 Red 2
8 05-11-2017 Green 1
9 05-11-2017 Blue 1
10 05-11-2017 Blue 2
11 05-11-2017 Red 2
12 05-11-2017 Blue 2
13 05-11-2017 Blue 1
14 05-12-2017 Green 2
15 05-12-2017 Blue 1
16 05-12-2017 Red 1
17 05-12-2017 Blue 2
18 05-12-2017 Blue 2
df1 = pd.crosstab(df2.Date, [df2.Jar, df2.Color])
df1.columns = df1.columns.map('{0[0]} {0[1]}'.format) # borrowed this line from https://stackoverflow.com/a/46102413/4909087
df1 = df1.add_prefix('Jar ')
df1
Jar 1 Blue Jar 1 Green Jar 1 Red Jar 2 Blue Jar 2 Green \
Date
05-10-2017 2 0 1 0 1
05-11-2017 2 2 0 2 0
05-12-2017 1 0 1 2 1
Jar 2 Red
Date
05-10-2017 1
05-11-2017 3
05-12-2017
选项2
pd.get_dummies
和df.groupby
df1 = df1.set_index('Date')
df1 = pd.get_dummies(df1.Jar.astype(str).str.cat(df1.Color, sep=' '))\
.add_prefix('Jar ').groupby(level=0).sum()
df1
Jar 1 Blue Jar 1 Green Jar 1 Red Jar 2 Blue Jar 2 Green \
Date
05-10-2017 2 0 1 0 1
05-11-2017 2 2 0 2 0
05-12-2017 1 0 1 2 1
Jar 2 Red
Date
05-10-2017 1
05-11-2017 3
05-12-2017 0
<强>性能强>
100 loops, best of 3: 13.4 ms per loop # pivot_table
100 loops, best of 3: 9.05 ms per loop # stacking, grouping, unstacking
100 loops, best of 3: 10.4 ms per loop # crosstab
100 loops, best of 3: 3.57 ms per loop # get_dummies
df * 10000
)10 loops, best of 3: 42.8 ms per loop # pivot_table
1 loop, best of 3: 913 ms per loop # stacking, grouping, unstacking
10 loops, best of 3: 43.1 ms per loop # crosstab
1 loop, best of 3: 885 ms per loop # get_dummies
您想要使用的内容取决于您的数据。
答案 2 :(得分:1)
或者你可以试试这个
get_dummies
编辑:方法相同,更简单,版本更快
List<Integer> listOfIntegers = new ArrayList<>();
// add some entries to the list
for(int i = 0; i < 202; i++) {
listOfIntegers.add(i);
}
for(int element : listOfIntegers){
// create a new list that excludes the current element
List<Integer> exclusionList = listOfIntegers.stream().filter(x -> x != element).collect(Collectors.toList());
// shuffle the new list (if you don't do it the output will use the same elements after the first few iterations)
Collections.shuffle(exclusionList);
// print the results and use only the first 3 elements of the shuffled list
System.out.println(element + ", " + exclusionList.subList(0, 3));
}
此版本应该超越0, [38, 193, 51]
1, [60, 179, 30]
2, [46, 21, 13]
3, [43, 201, 74]
4, [28, 14, 97]
5, [38, 24, 22]
6, [177, 106, 53]
方法(或执行相同的操作)。