我正在尝试根据收到的报告制作非规范化数据框。我需要将记录分配给一个组,该组来自一行,每组之间有随机文本和nan。满足条件时如何重写这些行值?我编写的循环似乎仅在满足条件时才覆盖下一个值,直到满足下一个条件时才执行此操作。请在下面查看我的数据和代码示例。本质上,我需要将行设置为主行,辅助行或我决定的任何其他组,但是行必须一直运行到命中下一个指定组为止。
当前数据:
Primary
Week#
1
nan
nan
nan
2
nan
nan
nan
Secondary
Week#
1
nan
nan
nan
2
nan
nan
nan
代码:
for index, obj in enumerate(df['col0']):
l = len(df['col0'])
if obj == 'Primary':
if index > 0:
previous = df['col0'][index - 1]
if index < (l - 1):
next_ = df['col0'][index + 1]
next_ = obj
print (next_, obj)
if obj == 'Secondary':
if index > 0:
previous = df['col0'][index - 1]
if index < (l - 1):
next_ = df['col0'][index + 1]
next_ = obj
print (next_, obj)
预期输出:
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
答案 0 :(得分:1)
执行此操作的更紧急的方法是仅保留您关心的值,然后将其填充到不需要的地方。例如:
df["col0_cleaned"] = df["col0"].where(df["col0"].isin(["Primary", "Secondary"])).ffill()
如果我们逐步采取这一措施,将会更加清楚发生了什么事情:
df["isin"] = df["col0"].isin(["Primary", "Secondary"])
df["where"] = df["col0"].where(df["col0"].isin(["Primary", "Secondary"]))
df["ffill"] = df["col0"].where(df["col0"].isin(["Primary", "Secondary"])).ffill()
这给了我
In [350]: df
Out[350]:
col0 isin where ffill
0 Primary True Primary Primary
1 Week# False NaN Primary
2 1 False NaN Primary
3 nan False NaN Primary
4 nan False NaN Primary
5 nan False NaN Primary
6 2 False NaN Primary
7 nan False NaN Primary
8 nan False NaN Primary
9 nan False NaN Primary
10 Secondary True Secondary Secondary
11 Week# False NaN Secondary
12 1 False NaN Secondary
13 nan False NaN Secondary
14 nan False NaN Secondary
15 nan False NaN Secondary
16 2 False NaN Secondary
17 nan False NaN Secondary
18 nan False NaN Secondary
19 nan False NaN Secondary