平铺数据以创建pandas数据帧

时间:2017-09-04 02:59:27

标签: python pandas dataframe

我是python pandas的新手。只是一个快速而简单的问题。假设我有两列,即"周"和"机器":

weeks = [1,3,5]
machine = [M1, M1, M2, M2]

我的计划是将这些列表放在DataFrame中,但我得到" ValueError:数组必须都是相同的长度"。我正在查看以下输出:

final_weeks = [1,2,3,4,5,1,2,3,4,5]
final_machine = [M1, M1, M1, M1, M1, M2, M2, M2, M2, M2]

tempDict = {'weeks': final_weeks, 'machine': final_machine}

我得到了两个列表,但不是数据帧。为什么我得到valueError?这是我到目前为止所做的:

maxWeek = df["weeks"].max()
uniqueMachine = set(df.machine)

unionWeeklist = [item for item in range(1, maxWeek+1)]
# Output = [1, 2, 3, 4, 5]

final_weeks = unionWeekList * len(uniqueMachine)
# [1,2,3,4,5,1,2,3,4,5]

machines = [[item]* maxWeek for item in uniqueMachine]
# Output: [[M1,M1,M1,M1,M1], [M2,M2,M2,M2,M2]]

final_machines = list(itertools.chain.from_iterable(machines))
# Flattened list = [M1,M1,M1,M1,M1,M2,M2,M2,M2,M2]

tmpDict = {'week': final_weeks, 'machine': final_machines}

# new dataframe
newdf = pd.DataFrame.from_records(tmpDict)

# ValueError: arrays must all be same length

2 个答案:

答案 0 :(得分:1)

试试这个..我想我得到了你需要的东西(PS:要得到你想要的东西,请按照cᴏʟᴅsᴘᴇᴇᴅ的回答)

weeks = [1,3,5]
machine = ['M1', 'M1', 'M2', 'M2']
newdf = pd.DataFrame(machine)
newdf.groupby(0).apply(lambda x : (x.reindex(range(1,max(weeks)+1)).ffill().bfill()))
Out[364]: 
       0
0       
M1 1  M1
   2  M1
   3  M1
   4  M1
   5  M1
M2 1  M2
   2  M2
   3  M2
   4  M2
   5  M2

答案 1 :(得分:0)

您可以使用DataFrame构造函数重复numpy.repeatnumpy.tile

#unique machines
uniq = np.sort(np.unique(np.array(machine)))
#repeated range
rng = np.arange(min(weeks), max(weeks)+1)

df = pd.DataFrame({'machine': np.repeat(uniq, len(rng)),
                   'week':np.tile(rng, len(uniq))}, columns=['week','machine'])

print (df)
   week machine
0     1      M1
1     2      M1
2     3      M1
3     4      M1
4     5      M1
5     1      M2
6     2      M2
7     3      M2
8     4      M2
9     5      M2

cᴏʟᴅsᴘᴇᴇᴅ's solution比较:

weeks = [1, 3, 5, 8, 13, 15, 17, 23, 24, 26]
machine = ['M{}'.format(x) for x in range(1, 51)]
print (machine)

In [29]: %%timeit
    ...: uniq = np.sort(np.unique(np.array(machine)))
    ...: #repeated range
    ...: rng = np.arange(min(weeks), max(weeks)+1)
    ...: 
    ...: df = pd.DataFrame({'machine': np.repeat(uniq, len(rng)),
    ...:                    'week':np.tile(rng, len(uniq))}, columns=['week','machine'])
    ...: 
1000 loops, best of 3: 636 µs per loop

In [30]: %%timeit
    ...: uniq_machine = sorted(set(machine))
    ...: df = pd.DataFrame(np.repeat(np.array(uniq_machine)\
    ...:                           .reshape(1, len(uniq_machine)), max(weeks), 0), 
    ...:                   index=range(1, max(weeks) + 1))
    ...: 
    ...: out = df.unstack().reset_index(level=0, drop=True)
    ...: out = out.reset_index()
    ...: out.columns = ['week', 'machine']
    ...: 
1000 loops, best of 3: 1.46 ms per loop