Question

我有以下数据框：

index_names = ['1c', '1s', '2c', '2s', '2s', '3c', '3s', '4c', '4s']

individual_atom_df = pd.DataFrame(columns=['Q0', 'Q1', 'Q2', 'Q3', 'Q4'], index=index_names)

返回以下内容：

     Q0   Q1   Q2   Q3   Q4
1c  NaN  NaN  NaN  NaN  NaN
1s  NaN  NaN  NaN  NaN  NaN
2c  NaN  NaN  NaN  NaN  NaN
2s  NaN  NaN  NaN  NaN  NaN
2s  NaN  NaN  NaN  NaN  NaN
3c  NaN  NaN  NaN  NaN  NaN
3s  NaN  NaN  NaN  NaN  NaN
4c  NaN  NaN  NaN  NaN  NaN
4s  NaN  NaN  NaN  NaN  NaN

符合预期。填充此数据帧的数据是列表中包含的列表，其中每个列表长度根据（2x +1）规则而变化。这是列表的示例：

my_list = [[-1.064525],
 [-4e-06, -0.105246, 0.036201],
 [0.340138, -6e-06, -2e-06, -0.454872, 0.383145],
 [4e-06, -0.208369, -0.482417, -4e-06, 3e-06, -0.105177, -0.097678],
 [0.047612,
  3.5e-05,
  5e-06,
  0.734665,
  0.979878,
  -2.9e-05,
  1.5e-05,
  0.45498,
  -0.005097]]

每个列表将占据该数据框与该列表的索引相关的一列，例如：

-1.064525：Q0-1c（因为-1.064525是my_list [0] [0]，所以它占据了Q0）

-4e-06：Q1-1c，-0.105246：Q1-1s，0.036201：Q1-2c

依次类推，直到数据框的右上对角线充满了my_list值，而左下对角线留下NaN。

我需要遍历my_list并填充数据框的列（其原因是因为这不是唯一的列表列表，实际上字典中包含很多列表列表，请参见下文））。

dictionary =  {'H5': [[0.355421],
  [-0.013164, -0.012894, 0.012746],
  [0.011902, 0.004148, 0.00579, -0.022556, 0.017715],
  [-0.007411, 0.015751, 0.003681, -0.0048, -0.020631, -0.004436, -0.002779],
  [-0.012934,
   -0.00844,
   -0.013543,
   0.003076,
   0.00371,
   -0.008476,
   -0.008116,
   -0.001628,
   0.006953]],
 'N1': [[-1.064525],
  [-4e-06, -0.105246, 0.036201],
  [0.340138, -6e-06, -2e-06, -0.454872, 0.383145],
  [4e-06, -0.208369, -0.482417, -4e-06, 3e-06, -0.105177, -0.097678],
  [0.047612,
   3.5e-05,
   5e-06,
   0.734665,
   0.979878,
   -2.9e-05,
   1.5e-05,
   0.45498,
   -0.005097]]}

我已经尝试过了，但是我对数据框架还是很陌生，对于在如何使用my_list内容填充数据框架方面的一些帮助，我将深表感谢。这是我尝试过的：

for kk in dictionary:

    # define dataframe
    individual_atom_df = pd.DataFrame(columns=['Q0', 'Q1', 'Q2', 'Q3', 'Q4'], index=index_names)

    # jj is a loop over Q0, Q1, Q2....
    for idx, val in enumerate(individual_atom_df):
        individual_atom_df[val].append(dictionary[kk][idx])

为每个字典元素生成的每个数据帧都将使用以下内容输出到.json文件中（将放置在循环末尾）：

coord_string = df.to_string().splitlines()

coord_data = {

    'File origin': file_directory,
    'Error list': error_array,
    'Data': coord_string

}

with open("file_name.json", "w") as coord_json:
    json.dump(file_name, coord_json, indent=4)

Answer 1

首先，在外循环中，每次迭代都将覆盖数据框。您需要以某种方式保存它，也许是在循环外定义的字典中保存它。也就是说，您可以在循环中完成的操作类似于以下内容：

# data
l1 = [np.random.rand()]
l2 = [np.random.rand() for i in range(3)]
l3 = [np.random.rand() for i in range(5)]
ll = [l1, l2, l3]

# find max length
maxlen = max(len(i) for i in ll)

# extend shorter arrays by filling with NaN
for col in ll:
    col.extend((maxlen-len(col)) * [np.nan])

# convert to array
arr = np.asarray(ll).T

df = pd.DataFrame(
    arr,
    columns=[f'Q{i}' for i in range(1,4)],
    index=['1c', '1s', '2c', '2s', '2s']
    )

有帮助吗？

用列表填充pd数据框，将其作为列循环遍历列表列表

1 个答案: