pandas - 计算列中值的使用

时间:2018-03-08 07:37:01

标签: python pandas

我正在尝试计算不同用户使用值的次数。这是一个例子。

import pandas as pd
user = ['1', '2', '3', '1']
val = [['a','b','c'],['a'],['c','d'],['a','d']]
df = pd.DataFrame({'user': user, 'val': val})

user    val
 1      [a, b, c]
 2      [a]
 3      [c, d]
 1      [a, d]

我的预期输出如下:

val     count
 a      2
 b      1
 c      2
 d      2

2 个答案:

答案 0 :(得分:1)

您可以使用import pandas as pd from collections import Counter from itertools import chain user = ['1', '2', '3', '1'] val = [['a','b','c'],['a'],['c','d'],['a','d']] df = pd.DataFrame({'user': user, 'val': val}) pd.Series(Counter(chain.from_iterable(df.groupby("user").val.apply(lambda s:set(chain.from_iterable(s))))))

res_X;
res_Y;

function A(cb) {
  method_X(function (data) {
    res_X = data;
    method_Y(function (data) {
      res_Y = data;
      if (res_X.msg != 'failed' && res_Y.msg != 'failed') {
        method_Z(res_X, res_Y, function (res_Z) {
          return cb({
            res_X,
            res_Y,
            res_Z
          });
        });
      } else return cb({
        res_X,
        res_Y
      });
    });
  });
}


A((data) => {
  console.log(data);
});

答案 1 :(得分:0)

您需要先按照每个组整理列表,按Counter获取唯一值,然后按value_countsfrom collections import Counter s = df.groupby('user')['val'].apply(lambda x: set([item for sub in x for item in sub])) df = (pd.Series([item for sublist in s for item in sublist]) .value_counts() .sort_index() .rename_axis('val') .reset_index(name='count')) 计算再次展平:

df = (pd.Series(Counter([item for sublist in s for item in sublist]))
       .sort_index()
       .rename_axis('val')
       .reset_index(name='count'))
print (df)
  val  count
0   a      2
1   b      1
2   c      2
3   d      2

或者:

import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
G.add_edges_from([['9606.EN01','9606.EN02'],['9606.EN01','9606.EN03']])

fig = plt.figure();
nx.draw(G, with_labels=True, font_weight='bold')
plt.draw()

my_degrees = G.degree();
degree_values = list(my_degrees.values());
fig = plt.figure();
plt.boxplot(degree_values)