Question

我无法使用不等长度列表中的值填充Pandas数据帧。

nx_lists_into_df是numpy数组的列表。

我收到以下错误：

ValueError：值的长度与索引的长度
不匹配

代码如下：

# Column headers
df_cols = ["f1","f2"]

# Create one dataframe fror each sheet
df1 = pd.DataFrame(columns=df_cols)
df2 = pd.DataFrame(columns=df_cols)

# Create list of dataframes to iterate through
df_list = [df1, df2]

# Lists to be put into the dataframes   
nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
                     array([2, 5, 6, 8])],
                    [array([0, 1, 2, 6, 7]),
                     array([3, 4, 5, 8])]]

# Loop through each sheet (i.e. each round of k folds)
for df, test_index_list in zip_longest(df_list, nx_lists_into_df):
    counter = -1
    # Loop through each column in that sheet (i.e. each fold)
    for col in df_cols:
        print(col)
        counter += 1
        # Add 1 to each index value to start indexing at 1
        df[col] = test_index_list[counter] + 1

感谢您的帮助。

编辑：结果应该是这样的结果： -

Answer 1

我们将利用pd.Series附加适当的索引，并允许我们使用pd.DataFrame构造函数，而不会抱怨不等长。

df1, df2 = (
    pd.DataFrame(dict(zip(df_cols, map(pd.Series, d))))
    for d in nx_lists_into_df
)

print(df1)

   f1   f2
0   0  2.0
1   1  5.0
2   3  6.0
3   4  8.0
4   7  NaN

print(df2)

   f1   f2
0   0  3.0
1   1  4.0
2   2  5.0
3   6  8.0
4   7  NaN

设置

from numpy import array

nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
                     array([2, 5, 6, 8])],
                    [array([0, 1, 2, 6, 7]),
                     array([3, 4, 5, 8])]]

# Column headers
df_cols = ["f1","f2"]

Answer 2

您可以预先定义DataFrame的大小（通过将索引范围设置为要添加的最长列的长度[或大于最长列的任何大小]），如下所示：

df1 = pd.DataFrame(columns=df_cols, index=range(5))
df2 = pd.DataFrame(columns=df_cols, index=range(5))

print(df1)
    f1   f2
0  NaN  NaN
1  NaN  NaN
2  NaN  NaN
3  NaN  NaN
4  NaN  NaN

（df2相同）

数据框将自动填充NaN。

然后，您使用.loc分别访问每个条目，如下所示：

for x in range(len(nx_lists_into_df)):
    for col_idx, y in enumerate(nx_lists_into_df[x]):
        df_list[x].loc[range(len(y)), df_cols[col_idx]] = y


print(df1)
  f1   f2
0  0    2
1  1    5
2  3    6
3  4    8
4  7  NaN

print(df2)
  f1   f2
0  0    3
1  1    4
2  2    5
3  6    8
4  7  NaN

第一个循环遍历数组的第一个维度（或您要创建的DataFrame的数量）。

第二个循环遍历DataFrame的列值，其中y是当前列的值，而df_cols [col_idx]是相应的列（f1或f2）。

由于row＆col索引的大小与y相同，因此不会出现长度不匹配的情况。

还可以检查enumerate(iterable, start=0)函数来解决那些“计数器”变量。

希望这会有所帮助。

Answer 3

如果我理解正确，可以通过pd.concat。

来实现

但请参阅@pir's solution了解可扩展版本。

# Lists to be put into the dataframes   
nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
                     array([2, 5, 6, 8])],
                    [array([0, 1, 2, 6, 7]),
                     array([3, 4, 5, 8])]]

df1 = pd.concat([pd.DataFrame({'A': nx_lists_into_df[0][0]}),
                 pd.DataFrame({'B': nx_lists_into_df[0][1]})],
                 axis=1)

#    A    B
# 0  0  2.0
# 1  1  5.0
# 2  3  6.0
# 3  4  8.0
# 4  7  NaN

df2 = pd.concat([pd.DataFrame({'C': nx_lists_into_df[1][0]}),
                 pd.DataFrame({'D': nx_lists_into_df[1][1]})],
                 axis=1)

#    C    D
# 0  0  3.0
# 1  1  4.0
# 2  2  5.0
# 3  6  8.0
# 4  7  NaN

使用不等长度的列表填充Pandas列

3 个答案: