重新采样pandas数据帧时的NaN值

时间:2017-10-15 09:09:19

标签: python pandas resampling

我有一个带有两个不同列的 pandas 数据框:

  • 日期时间索引列;
  • 包含dict的列

如果我运行一个自定义重新采样器,返回一个新的dict作为结果,我在重采样数据帧中得到一个 NaN 值。

是否可能运行不返回数字的重新采样?

谢谢, FB

EDIT1: 这是一个数据样本:

2017-10-15 06:55:14.237039000,"{'SMA120C': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA115_L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA121_CT': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA110_4L': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}, 'SMA111': {'status': 9, 'program': 5, 'velocity': 2188, 'totalProduction': 1488, 'dailyProduction': 4051, 'onlineHours': 4672, 'workingHours': 2399, 'errorInfo': 'Error No :687.808'}}"
2017-10-15 06:55:18.584042000,"{'SMA120C': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA115_L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA121_CT': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA110_4L': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}, 'SMA111': {'status': 7, 'program': 2, 'velocity': 6004, 'totalProduction': 6661, 'dailyProduction': 3353, 'onlineHours': 10, 'workingHours': 6845, 'errorInfo': 'Error No :468.4648'}}"
2017-10-15 06:55:22.881817000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 6297, 'totalProduction': 8210, 'dailyProduction': 4639, 'onlineHours': 9978, 'workingHours': 2088, 'errorInfo': 'Error No :554.4214'}}"
2017-10-15 06:55:27.234606000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 7441, 'totalProduction': 5332, 'dailyProduction': 3378, 'onlineHours': 836, 'workingHours': 537, 'errorInfo': 'Error No :732.317'}}"
2017-10-15 06:55:31.593890000,
2017-10-15 06:55:35.978696000,"{'SMA120C': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA115_L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA121_CT': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA110_4L': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}, 'SMA111': {'status': 4, 'program': 10, 'velocity': 611, 'totalProduction': 2065, 'dailyProduction': 7027, 'onlineHours': 9835, 'workingHours': 108, 'errorInfo': 'Error No :98.62041'}}"
2017-10-15 06:55:40.296786000,"{'SMA120C': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA115_L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA121_CT': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA110_4L': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}, 'SMA111': {'status': 3, 'program': 2, 'velocity': 530, 'totalProduction': 9965, 'dailyProduction': 9802, 'onlineHours': 839, 'workingHours': 7992, 'errorInfo': 'Error No :817.9922'}}"
2017-10-15 06:55:44.655286000,"{'SMA120C': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA115_L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA121_CT': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA110_4L': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}, 'SMA111': {'status': 1, 'program': 9, 'velocity': 4611, 'totalProduction': 2600, 'dailyProduction': 6396, 'onlineHours': 9232, 'workingHours': 3880, 'errorInfo': 'Error No :379.0488'}}"
2017-10-15 06:55:48.957150000,"{'SMA120C': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA115_L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA121_CT': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA110_4L': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}, 'SMA111': {'status': 5, 'program': 2, 'velocity': 3566, 'totalProduction': 2809, 'dailyProduction': 3220, 'onlineHours': 2997, 'workingHours': 3118, 'errorInfo': 'Error No :308.7919'}}"
2017-10-15 06:55:53.299944000,

我刚过滤掉第二列不包含任何基于字符串的字典的行。

EDIT2:

重新采样器功能:

def custom_resampler(array_like):
    ref_el = {}
    data = {}
    for element in filter(lambda item: item is not None, array_like):
        for machine in element.keys():
                if not ref_el.get(machine, None):
                    ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                    data[machine] = {
                        '0': [],
                        '1': [],
                        '2':[],
                        '3':[],
                        '4':[],
                        '5':[],
                        '6': [],
                        '7':[],
                        '8':[],
                        '9':[],
                        '10':[]
                    }
                else:
                    status = str(element[machine]['status'])
                    total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                    data[machine][status].append(
                        total_prod_diff
                    )
                    ref_el[machine] = element[machine].get('totalProduction', 0)

1 个答案:

答案 0 :(得分:2)

您需要先将列从for numThrows in (100, 1000, 10000, 100000, 1000000, 10000000): randpi = computePI(numThrows) diff = randpi - math.pi print "num = %-8d Calculated PI = %8.6f Difference = %+9.6f" % \ (numThrows, randpi, diff) 转换为dictionaries

strings

然后使用输出聚合字典将返回函数添加到结尾:

import ast
df['col'] = df['col'].fillna('{}').apply(ast.literal_eval)
def custom_resampler(array_like):
    ref_el = {}
    data = {}
    for element in filter(lambda item: item is not None, array_like['fetched_data']):
        for machine in element.keys():
                if not ref_el.get(machine, None):
                    ref_el[machine] = element[machine].get('totalProduction', 0) if isinstance(element[machine], dict) else 0
                    data[machine] = {
                        '0': [],
                        '1': [],
                        '2':[],
                        '3':[],
                        '4':[],
                        '5':[],
                        '6': [],
                        '7':[],
                        '8':[],
                        '9':[],
                        '10':[]
                    }
                else:
                    status = str(element[machine]['status'])
                    total_prod_diff = element[machine].get('totalProduction', 0) - ref_el[machine]
                    data[machine][status].append(
                        total_prod_diff
                    )
                    ref_el[machine] = element[machine].get('totalProduction', 0)
    #return ouptut dict
    return [ref_el]