Question

我在json_data文件中有json记录。我使用pd.DataFrame(json_data)使用这些记录创建了一个新表pd_json_data。

我想操纵pd_json_data返回一个包含主键（url，hour）的新表，然后返回包含布尔值的更新列

小时基于支票数量。例如，如果支票数在第0行包含378，则新表应在小时中包含数字1到378，并在中更新如果小时中的数字是肯定支票中的数字。

关于如何处理这个问题的任何想法？

Answer 1

更新答案

制作假数据

df = pd.DataFrame({'number of checks': [5, 10, 300, 8], 
                   'positive checks':[[1,3,10], [10,11], [9,200], [1,8,7]], 
                   'url': ['a', 'b', 'c', 'd']})

输出

   number of checks positive checks url
0                 5      [1, 3, 10]   a
1                10        [10, 11]   b
2               300        [9, 200]   c
3                 8       [1, 8, 7]   d

迭代并创建新的数据帧，然后连接

dfs = []
for i, row in df.iterrows():
    hour =  np.arange(1, row['number of checks'] + 1)
    df_cur = pd.DataFrame({'hour' : hour,
                          'url': row['url'],
                          'updated': np.in1d(hour, row['positive checks'])})
    dfs.append(df_cur)
df_final = pd.concat(dfs)

    hour updated url
0       1    True   a
1       2   False   a
2       3    True   a
3       4   False   a
4       5   False   a
0       1   False   b
1       2   False   b
2       3   False   b
3       4   False   b
4       5   False   b
5       6   False   b
6       7   False   b
7       8   False   b
8       9   False   b
9      10    True   b
0       1   False   c
1       2   False   c

旧答案

现在构建新数据框

df1 = df[['url']].copy()
df1['hour'] = df['number of checks'].map(lambda x: list(range(1, x + 1)))
df1['updated'] = df.apply(lambda x: x['number of checks'] in x['positive checks'], axis=1)

输出

  url                                            hour updated
0   a                                 [1, 2, 3, 4, 5]   False
1   b                 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]    True
2   c  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...   False
3   d                        [1, 2, 3, 4, 5, 6, 7, 8]    True

无法从给定的JSON记录列表创建/操作Pandas DataFrame

1 个答案:

更新答案

旧答案