我在熊猫中设置了以下数据。
import numpy as np
import pandas as pd
events = ['event1', 'event2', 'event3', 'event4', 'event5', 'event6']
wells = [np.array([1, 2]), np.array([1, 3]), np.array([1]),
np.array([4, 5, 6]), np.array([4, 5, 6]), np.array([7, 8])]
traces_per_well = [np.array([24, 24]), np.array([24, 21]), np.array([18]),
np.array([24, 24, 24]), np.array([24, 21, 24]), np.array([18, 21])]
df = pd.DataFrame({"event_no": events, "well_array": wells,
"trace_per_well": traces_per_well})
df["total_traces"] = df['trace_per_well'].apply(np.sum)
df['supposed_traces_no'] = df['well_array'].apply(lambda x: len(x)*24)
df['pass'] = df['total_traces'] == df['supposed_traces_no']
print(df)
输出显示在下面:
event_no well_array trace_per_well total_traces supposed_traces_no pass
0 event1 [1, 2] [24, 24] 48 48 True
1 event2 [1, 3] [24, 21] 45 48 False
2 event3 [1] [18] 18 24 False
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False
5 event6 [7, 8] [18, 21] 39 48 False
我想创建两个新列,其中将不等于24的列trace_per_well
中的numpy数组项放在一列中,并将来自列well_array
的对应数组元素中的列另一列
结果应如下所示。
event_no well_array trace_per_well total_traces supposed_traces_no pass wrong_trace_in_well wrong_well
0 event1 [1, 2] [24, 24] 48 48 True NaN NaN
1 event2 [1, 3] [24, 21] 45 48 False 21 3
2 event3 [1] [18] 18 24 False 18 1
3 event4 [4, 5, 6] [24, 24, 24] 72 72 True NaN NaN
4 event5 [4, 5, 6] [24, 21, 24] 69 72 False 21 5
5 event6 [7, 8] [18, 21] 39 48 False (18, 21) (7, 8)
非常感谢您的帮助!
答案 0 :(得分:2)
我会通过列表理解来做到这一点。单次传递数据即可生成结果,然后将其分配给适当的列。
v = pd.Series(
[list(zip(*((x, y) for x, y in zip(X, Y) if x != 24)))
for X, Y in zip(df['trace_per_well'], df['well_array'])])
df['wrong_trace_in_well'] = v.str[0]
df['wrong_well'] = v.str[-1]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 NaN NaN
1 (21,) (3,)
2 (18,) (1,)
3 NaN NaN
4 (21,) (5,)
5 (18, 21) (7, 8)
或者,如果您想多次通过,则
df['wrong_trace_in_well'] = [[x for x in X if x != 24] for X in df['trace_per_well']]
df['wrong_well'] = [
[y for x, y in zip(X, Y) if x != 24]
for X, Y in zip(df['trace_per_well'], df['well_array'])]
df[['wrong_trace_in_well', 'wrong_well']]
wrong_trace_in_well wrong_well
0 [] []
1 [21] [3]
2 [18] [1]
3 [] []
4 [21] [5]
5 [18, 21] [7, 8]