Question

我有一个excel文件，其中包含非组织方式的行值。我想通过对齐专用于一列的相同数据来组织每一行。例如：

我想将文件格式化为：

应该在每一行中复制属性名称及其值，而不仅仅是属性名称。

Python代码：

    import pandas as pd
    import glob

    for f in glob.glob("../Book1.xlsx"):
    df = pd.read_excel(f)

    df1=df.apply(lambda x: sorted(x.values), axis=1)

但是它按以下格式对所有值进行排序：

我希望只对Attribute列进行排序，并且Value列应始终附加到Attribute。即纸张尺寸具有值Legal，因此在排序时它应仅附加到Paper_size而不附加到其他属性。

这可以在python中实现吗？三江源

Answer 1

可能有更多的“pythonic”和“pandas”方法，但我找不到一个，而是手动重命名列名是我获得该结果的唯一方法：

# Using your same code
df1 = df.apply(lambda x: sorted(x.values), axis=1)

cols = df1.columns.tolist()

new_cols = [cols[4], cols[3], cols[1], cols[7], cols[5], cols[2], cols[6], cols[0]]

df2 = df1[new_cols]

df2.columns = ['Attribute_1', 'Value', 'Attribute_2', 'Value', 'Attribute_3', 'Value', 'Attribute_4', 'Value']

现在df2看起来像：

    Attribute_1  Value   Attribute_2   Value Attribute_3 Value Attribute_4 Value
NaN  Paper_size  Legal  Color_family  Yellow   Ring_type     D    Tab_type   A-Z
NaN  Paper_size  Legal  Color_family  Yellow   Ring_type     D    Tab_type   A-Z
NaN  Paper_size  Legal  Color_family  Yellow   Ring_type     D    Tab_type   A-Z
NaN  Paper_size  Legal  Color_family  Yellow   Ring_type     D    Tab_type   A-Z

似乎与第二张图片中的输出相匹配。

这不是很好，但我想我会把它放在这里直到有人提供更好的解决方案。

自定义数据框中的所有行按一个顺序排序

1 个答案: