我是python pandas的新手。只是一个快速而简单的问题。假设我有两列,即"周"和"机器":
weeks = [1,3,5]
machine = [M1, M1, M2, M2]
我的计划是将这些列表放在DataFrame中,但我得到" ValueError:数组必须都是相同的长度"。我正在查看以下输出:
final_weeks = [1,2,3,4,5,1,2,3,4,5]
final_machine = [M1, M1, M1, M1, M1, M2, M2, M2, M2, M2]
tempDict = {'weeks': final_weeks, 'machine': final_machine}
我得到了两个列表,但不是数据帧。为什么我得到valueError?这是我到目前为止所做的:
maxWeek = df["weeks"].max()
uniqueMachine = set(df.machine)
unionWeeklist = [item for item in range(1, maxWeek+1)]
# Output = [1, 2, 3, 4, 5]
final_weeks = unionWeekList * len(uniqueMachine)
# [1,2,3,4,5,1,2,3,4,5]
machines = [[item]* maxWeek for item in uniqueMachine]
# Output: [[M1,M1,M1,M1,M1], [M2,M2,M2,M2,M2]]
final_machines = list(itertools.chain.from_iterable(machines))
# Flattened list = [M1,M1,M1,M1,M1,M2,M2,M2,M2,M2]
tmpDict = {'week': final_weeks, 'machine': final_machines}
# new dataframe
newdf = pd.DataFrame.from_records(tmpDict)
# ValueError: arrays must all be same length
答案 0 :(得分:1)
试试这个..我想我得到了你需要的东西(PS:要得到你想要的东西,请按照cᴏʟᴅsᴘᴇᴇᴅ的回答)
weeks = [1,3,5]
machine = ['M1', 'M1', 'M2', 'M2']
newdf = pd.DataFrame(machine)
newdf.groupby(0).apply(lambda x : (x.reindex(range(1,max(weeks)+1)).ffill().bfill()))
Out[364]:
0
0
M1 1 M1
2 M1
3 M1
4 M1
5 M1
M2 1 M2
2 M2
3 M2
4 M2
5 M2
答案 1 :(得分:0)
您可以使用DataFrame
构造函数重复numpy.repeat
和numpy.tile
:
#unique machines
uniq = np.sort(np.unique(np.array(machine)))
#repeated range
rng = np.arange(min(weeks), max(weeks)+1)
df = pd.DataFrame({'machine': np.repeat(uniq, len(rng)),
'week':np.tile(rng, len(uniq))}, columns=['week','machine'])
print (df)
week machine
0 1 M1
1 2 M1
2 3 M1
3 4 M1
4 5 M1
5 1 M2
6 2 M2
7 3 M2
8 4 M2
9 5 M2
与cᴏʟᴅsᴘᴇᴇᴅ's solution
比较:
weeks = [1, 3, 5, 8, 13, 15, 17, 23, 24, 26]
machine = ['M{}'.format(x) for x in range(1, 51)]
print (machine)
In [29]: %%timeit
...: uniq = np.sort(np.unique(np.array(machine)))
...: #repeated range
...: rng = np.arange(min(weeks), max(weeks)+1)
...:
...: df = pd.DataFrame({'machine': np.repeat(uniq, len(rng)),
...: 'week':np.tile(rng, len(uniq))}, columns=['week','machine'])
...:
1000 loops, best of 3: 636 µs per loop
In [30]: %%timeit
...: uniq_machine = sorted(set(machine))
...: df = pd.DataFrame(np.repeat(np.array(uniq_machine)\
...: .reshape(1, len(uniq_machine)), max(weeks), 0),
...: index=range(1, max(weeks) + 1))
...:
...: out = df.unstack().reset_index(level=0, drop=True)
...: out = out.reset_index()
...: out.columns = ['week', 'machine']
...:
1000 loops, best of 3: 1.46 ms per loop