Question

我正在使用json_normalize解析熊猫列的json条目。但是，作为输出，我得到一个包含多行的数据帧，每行只有一个非空条目。我想将所有这些行组合成大熊猫中的一行。

currency    custom.gt   custom.eq   price.gt    price.lt
0   NaN 4.0 NaN NaN NaN
1   NaN NaN NaN 999.0   NaN
2   NaN NaN NaN NaN 199000.0
3   NaN NaN other   NaN NaN
4   USD NaN NaN NaN NaN

Answer 1

您可以做到

import pandas as pd
from functools import reduce

df = pd.DataFrame.from_dict({"a":["1", None, None],"b" : [None, None, 1], "c":[None, 3, None]})

def red_func(x,y) :
   if pd.isna(x) or pd.isnull(x) :
     return y
 result = [*map( lambda x : reduce(f,x), [list(row) for i, row in df.iterrows()]),]

输出：

In [135]: df
Out[135]:
      a    b    c
0     1  NaN  NaN
1  None  NaN  3.0
2  None  1.0  NaN

In [136]: [*map( lambda x : reduce(f,x), [list(row) for i, row in df.iterrows()]),]
Out[136]: ['1', 3.0, 1.0]

Answer 2

您可以使用ffill（正向填充）和bfill（回填），它们是在熊猫中填充NA值的方法。

# fill NA values
# option 1: 
df = df.ffill().bfill()

# option 2: 
df = df.fillna(method='ffill').fillna(method='bfill')

print(df)

    currency    custom.gt   custom.eq   price.gt    price.lt
0   USD 4.0 other   999.0   199000.0
1   USD 4.0 other   999.0   199000.0
2   USD 4.0 other   999.0   199000.0
3   USD 4.0 other   999.0   199000.0
4   USD 4.0 other   999.0   199000.0

然后您可以使用drop_duplicates删除重复的行，并保留第一个：

df = df.drop_duplicates(keep='first')
print(df)

    currency    custom.gt   custom.eq   price.gt    price.lt
0   USD 4.0 other   999.0   199000.0

根据您必须重复执行该任务的次数，我还可以查看JSON文件的结构，以了解使用字典理解是否可以帮助清理内容，以便json_normalize可以对其进行解析第一次更轻松。

如何在熊猫数据框中将每行只有1个非空条目的多行合并为一行？

2 个答案: