小

Question

我正在尝试使用这样的数据框（抱歉格式化，我在手机上输入）：

'Date'                  'Color'    'Jar'
0   '05-10-2017'    'Red'       1
1   '05-10-2017'    'Green'      2
2   '05-10-2017'    'Blue'       1
3   '05-10-2017'    'Red'      2
4   '05-10-2017'    'Blue'       1
5   '05-11-2017'    'Red'      2
6   '05-11-2017'    'Green'       1
7   '05-11-2017'    'Red'      2
8   '05-11-2017'    'Green'       1
9   '05-11-2017'    'Blue'       1
10  '05-11-2017'    'Blue'      2
11  '05-11-2017'    'Red'      2
12  '05-11-2017'    'Blue'      2
13  '05-11-2017'    'Blue'       1
14  '05-12-2017'    'Green'      2
15  '05-12-2017'    'Blue'       1
16  '05-12-2017'    'Red'       1
17  '05-12-2017'    'Blue'      2
18  '05-12-2017'    'Blue'       2

并派生一个看起来像下面的列，其中列填充每个日期的实例数。

Date.                     Jar 1 Red    Jar 2 Red   Jar 1 Green  Jar 2 Green Jar 1 Blue Jar 2 Blue
05-10-2017
05-11-2017
05-12-2017

我试图使用groupby来完成这个并且能够获得每天每种颜色的计数，但我不确定如何分割他们来自Jar的颜色列。我还读到了查询或者loc可能会打赌完成此任务的选项。任何方向将不胜感激。

Answer 1

我们试试这个：

df_out = df.assign(count=1).pivot_table(index='Date',columns=['Jar','Color'], values='count',aggfunc='sum', fill_value=0)

df_out.columns = df_out.columns.map('{0[0]} {0[1]}'.format)

df_out.add_prefix('Jar ')

输出：

            Jar 1 Blue  Jar 1 Green  Jar 1 Red  Jar 2 Blue  Jar 2 Green  \
Date                                                                      
05-10-2017           2            0          1           0            1   
05-11-2017           2            2          0           2            0   
05-12-2017           1            0          1           2            1   

            Jar 2 Red  
Date                   
05-10-2017          1  
05-11-2017          3  
05-12-2017          0

Answer 2

选项1

pd.crosstab

df1

          Date  Color  Jar
0   05-10-2017    Red    1
1   05-10-2017  Green    2
2   05-10-2017   Blue    1
3   05-10-2017    Red    2
4   05-10-2017   Blue    1
5   05-11-2017    Red    2
6   05-11-2017  Green    1
7   05-11-2017    Red    2
8   05-11-2017  Green    1
9   05-11-2017   Blue    1
10  05-11-2017   Blue    2
11  05-11-2017    Red    2
12  05-11-2017   Blue    2
13  05-11-2017   Blue    1
14  05-12-2017  Green    2
15  05-12-2017   Blue    1
16  05-12-2017    Red    1
17  05-12-2017   Blue    2
18  05-12-2017   Blue    2

df1 = pd.crosstab(df2.Date, [df2.Jar, df2.Color])
df1.columns = df1.columns.map('{0[0]} {0[1]}'.format) # borrowed this line from https://stackoverflow.com/a/46102413/4909087
df1 = df1.add_prefix('Jar ')
df1

            Jar 1 Blue  Jar 1 Green  Jar 1 Red  Jar 2 Blue  Jar 2 Green  \
Date                                                                      
05-10-2017           2            0          1           0            1   
05-11-2017           2            2          0           2            0   
05-12-2017           1            0          1           2            1   

            Jar 2 Red  
Date                   
05-10-2017          1  
05-11-2017          3  
05-12-2017

选项2

pd.get_dummies和df.groupby

df1 = df1.set_index('Date')
df1 = pd.get_dummies(df1.Jar.astype(str).str.cat(df1.Color, sep=' '))\
                               .add_prefix('Jar ').groupby(level=0).sum()
df1

            Jar 1 Blue  Jar 1 Green  Jar 1 Red  Jar 2 Blue  Jar 2 Green  \
Date                                                                      
05-10-2017           2            0          1           0            1   
05-11-2017           2            2          0           2            0   
05-12-2017           1            0          1           2            1   

            Jar 2 Red  
Date                   
05-10-2017          1  
05-11-2017          3  
05-12-2017          0

<强>性能

小

100 loops, best of 3: 13.4 ms per loop # pivot_table
100 loops, best of 3: 9.05 ms per loop # stacking, grouping, unstacking
100 loops, best of 3: 10.4 ms per loop # crosstab
100 loops, best of 3: 3.57 ms per loop # get_dummies

大（ `df 10000`* ）

10 loops, best of 3: 42.8 ms per loop # pivot_table
1 loop, best of 3: 913 ms per loop    # stacking, grouping, unstacking
10 loops, best of 3: 43.1 ms per loop # crosstab
1 loop, best of 3: 885 ms per loop    # get_dummies

您想要使用的内容取决于您的数据。

Answer 3

或者你可以试试这个

get_dummies

编辑：方法相同，更简单，版本更快

List<Integer> listOfIntegers = new ArrayList<>();

// add some entries to the list
for(int i = 0; i < 202; i++) {
listOfIntegers.add(i);
}

for(int element : listOfIntegers){
// create a new list that excludes the current element
List<Integer> exclusionList = listOfIntegers.stream().filter(x -> x != element).collect(Collectors.toList());
// shuffle the new list (if you don't do it the output will use the same elements after the first few iterations)
Collections.shuffle(exclusionList);
// print the results and use only the first 3 elements of the shuffled list
System.out.println(element + ", " + exclusionList.subList(0, 3));
}

此版本应该超越0, [38, 193, 51] 1, [60, 179, 30] 2, [46, 21, 13] 3, [43, 201, 74] 4, [28, 14, 97] 5, [38, 24, 22] 6, [177, 106, 53]方法（或执行相同的操作）。

pandas数据框中的透视和计数条件

3 个答案:

小

大（ `df 10000`* ）

pandas数据框中的透视和计数条件

3 个答案:

小

大（ df * 10000 ）

大（ `df 10000`* ）