熊猫 - 字典列表的列扩展 - 如何优化?

时间:2018-06-13 14:27:38

标签: python pandas pandas-groupby

我的数据框$ hive Logging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1528895103783_0002 to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has 1 applications, cannot accept submission of application: application_1528895103783_0002 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1528895103783_0002 to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has 1 applications, cannot accept submission of application: application_1528895103783_0002 at org.apache.tez.client.TezClient.start(TezClient.java:388) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:197) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:579) ... 8 more Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1528895103783_0002 to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has 1 applications, cannot accept submission of application: application_1528895103783_0002 at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:272) at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) at org.apache.tez.client.TezClient.start(TezClient.java:383) ... 11 more 包含3列import React, { Component } from 'react'; import { Text, View,StyleSheet } from "react-native"; export default class App extends Component { render() { return ( <View style={{flex:1}}> <View style={styles.wrapper}> <View style={styles.rectangle}><Text>6</Text> <Text>tens</Text> </View> <View style={styles.rectangle}><Text>6</Text> <Text>ones</Text> </View>> <View style={styles.triangle}></View> </View> </View> ); } } const styles = StyleSheet.create({ wrapper: { flexDirection: "row", justifyContent: "flex-start", flex: 0.2, alignItems: "center", paddingLeft: 29, paddingTop: 0, marginTop: 0 }, rectangle: { width: 50, backgroundColor: "yellow", margin: 0, justifyContent: "center", alignItems: "center", height: 52, borderColor:"black" }, triangle: { width: 0, height: 0, backgroundColor: 'transparent', borderStyle: 'solid', borderLeftWidth: 27, borderRightWidth: 27, borderBottomWidth: 43, borderLeftColor: 'transparent', borderRightColor: 'transparent', borderBottomColor:"yellow", transform: [ { rotate: '90deg' } ], margin: 0, marginLeft: -6, borderWidth: 0, borderColor:"black" } }); 以下列testid, name, value看起来如何:

test['values']

测试值单元格如下所示:

test

我可以通过以下方式扩展它:

    name                  values
0   impressions           [{'value': 17686, 'end_time': '2018-06-12T07:0...
1   reach                 [{'value': 6294, 'end_time': '2018-06-12T07:00...
2   follower_count        [{'value': 130, 'end_time': '2018-06-12T07:00:...
3   email_contacts        [{'value': 1, 'end_time': '2018-06-12T07:00:00...
4   phone_call_clicks     [{'value': 0, 'end_time': '2018-06-12T07:00:00...
5   text_message_clicks   [{'value': 0, 'end_time': '2018-06-12T07:00:00...
6   get_directions_clicks [{'value': 0, 'end_time': '2018-06

结果是这样的:

[{'end_time': '2018-06-12T07:00:00+0000', 'value': 17686},
 {'end_time': '2018-06-13T07:00:00+0000', 'value': 4064}]

我想知道是否:

a。有一种更快捷的方式来扩展词典列表

b。有一种方法可以取消数据的转换,使值1和值2在一列上。而日期1和日期2在另一栏

2 个答案:

答案 0 :(得分:4)

如果输入数据是jsons,最好使用json_normalize

var layout = {
  xaxis: {
    fixedrange: true,
    autoexpand: false
  },
  xaxis2: {
    fixedrange: true,
    domain: [0,1],
    anchor: 'free',
    overlaying: 'x',
    position: 0.1
  },
  xaxis3: {
    fixedrange: true,
    domain: [0,1],
    anchor: 'free',
    overlaying: 'x',
    position: 0.2
  },
  yaxis: {
    fixedrange: true,
    zeroline: false,
    rangemode: 'tozero'
  },
  showlegend: false,
  autosize: false,
  width: 450,
  height: 220,
  margin: {
    l: 10,
    r: 10,
    b: 60,
    t: 10,
    pad: 40
  }
};

但如果还需要添加原始列:

j = [{'description': 'Total number 1', 'id': 'a', 'name': 'impressions', 'period': 'day', 'title': 'Impressions', 'values': [{'end_time': '2018-06-12T07:00:00+0000', 'value': 17686}, {'end_time': '2018-06-13T07:00:00+0000', 'value': 4064}]},
      {'description': 'fn', 'id': 'b', 'name': 'impressions', 'period': 'day', 'title': 'Impressions', 'values': [{'end_time': '2018-06-12T07:00:00+0000', 'value': 17686}, {'end_time': '2018-06-13T07:00:00+0000', 'value': 4064}]}]

from pandas.io.json import json_normalize

df = json_normalize(j, 'values')
print (df)
                   end_time  value
0  2018-06-12T07:00:00+0000  17686
1  2018-06-13T07:00:00+0000   4064
2  2018-06-12T07:00:00+0000  17686
3  2018-06-13T07:00:00+0000   4064

第一个解决方案:

from pandas.io.json import json_normalize


df = json_normalize(j, 'values', ['description', 'id', 'name', 'period', 'title'])
print (df)
                   end_time  value     description id         name period  \
0  2018-06-12T07:00:00+0000  17686  Total number 1  a  impressions    day   
1  2018-06-13T07:00:00+0000   4064  Total number 1  a  impressions    day   
2  2018-06-12T07:00:00+0000  17686              fn  b  impressions    day   
3  2018-06-13T07:00:00+0000   4064              fn  b  impressions    day   

         title  
0  Impressions  
1  Impressions  
2  Impressions  
3  Impressions  

答案 1 :(得分:1)

您可以使用两个applystack(加set_indexreset_index)同时创建列值和end_time:

(test.set_index('name')['values']
       .apply(pd.Series).stack()
         .apply(pd.Series).reset_index().drop('level_1',1))

输出如下:

          name                  end_time  value
0  impressions  2018-06-12T07:00:00+0000  17686
1  impressions  2018-06-13T07:00:00+0000   4064