Python dict与元组索引缩进dict

时间:2019-11-28 02:53:42

标签: pandas dataframe dictionary pandas-groupby multi-index

我正在分析一个Facebook对话,我想知道每个人每天的每小时发送多少条消息。使用熊猫,我做了public class Test { public static void main(String[] args) { List<String> averages = getAllLinesFromFileFromPath("/averages.txt"); double lowest = Double.valueOf(averages.get(0)); for (String line: averages) { Double weekValue = Double.valueOf(line); if (lowest>weelValue) {lowest = weekValue;} } } public static List<String> getAllLinesFromFileFromPath(String filename) { try { BufferedReader br = new BufferedReader(new FileReader(Paths.get(filename).toFile())); List<String> result = new ArrayList<>(); String line; while ((line = br.readLine())!=null) { result.add(line); } br.close(); return result; } catch (Exception e) { e.printStackTrace(); return null; } } } 。返回的系列对象具有以下所需形式:

data['n_msg_by_hour'] = df.groupby(['author', df['date'].dt.hour])['_id'].count()

但是,当我做Djézeune 0 4866 1 4549 2 4463 3 3841 4 2560 5 1029 6 396 7 239 8 76 9 56 10 40 11 88 12 340 13 685 14 1253 15 1712 16 2224 17 2650 18 2439 19 2951 20 3347 21 3575 22 4696 23 4741 Vinssan 0 108 1 129 2 84 3 72 4 8 5 17 6 4 7 1 8 1 9 1 11 4 12 26 13 37 14 81 15 114 16 92 17 123 18 83 19 95 20 58 21 112 22 87 23 109 Name: _id, dtype: int64 时,我有一个以元组为键的字典,像这样:

data['n_msg_by_hour'].to_dict()

但是我希望有一个缩进的字典,然后将其放入json

{
('Djézeune', 0):4866,
('Djézeune', 1):4549,
('Djézeune', 10):40,
('Djézeune', 11):88,
('Djézeune', 12):340,
('Djézeune', 13):685,
('Djézeune', 14):1253,
...
('Vinssan', 0):108,
('Vinssan', 1):129,
('Vinssan', 10):0,
('Vinssan', 11):4,
('Vinssan', 12):26,
('Vinssan', 13):37,
('Vinssan', 14):81,
}

是否可以使用{ 'Djézeune':{0:4866, 1:4549, 10:40, 11:88, 12:340, 13:685, 14:1253 ...}, 'Vinssan':{0:108, 1:129, 10:0, 11:4, 12:26, 13:37, 14:81 ...} } 的{​​{1}}选项或level的{​​{1}}之类的函数来轻松实现此目的,而无需遍历字典键?

DataFrame中的每一行如下:

groupby

1 个答案:

答案 0 :(得分:1)

通过对索引的第一级进行分组并遍历结果Series es,可能最容易实现:

In [320]: s = pd.Series(np.random.random(48), index=pd.MultiIndex.from_product([["DJ", "Vin"], range(24)]))

In [321]: d = {k: v.droplevel(0).to_dict() for k, v in s.groupby(level=0)}

In [322]: d
Out[322]:
{'DJ': {0: 0.8731657595223525,
  1: 0.6806768452816228,
  2: 0.6376297431476246,
  ...
  23: 0.9995968607512785},
 'Vin': {0: 0.19255930821536904,
  1: 0.944802244484905,
  2: 0.1171672201795304,
  ...
  23: 0.7387196132363647}}