用不相等的行和列制作一个熊猫数据框

时间:2018-07-03 03:28:14

标签: python pandas

我有一个JSON文件,我从中取出键用作数据帧的行,接下来,我从所有键中取出所有值,并将它们放入平展列表中。我想将该值列表用作列。但是有八个值和五个键

JSON:

{
"student1": [
"view_grades",
"view_classes"
],
"student2": [
"view_grades",
"view_classes"
],
"teacher": [
"view_grades",
"change_grades",
"add_grades",
"delete_grades",
"view_classes"
],
"principle": [
"view_grades",
"view_classes",
"change_classes",
"add_classes",
"delete_classes"
]
}

convert.py

def json_to_csv():
    with open('C:/Users/Elitebook/Documents/GitHub/permissions.json') as json_file:
        #convert to python dict
        py_dict = json.load(json_file)
        #first get a list of all the values(permissions) from the dict, flatten the list and return only unique values
        permissions = sorted(set([key for value in py_dict.itervalues() for key in value]))


        #create a dataframe from the python dictionary
        pd.DataFrame.from_dict(py_dict, orient='index', columns=permissions)

我遇到了AssertionError: 8 columns passed, passed data had 5 columns错误,我想要它,所以我可以有8列和5行。然后,我可以将所需的内容放在数据框的值字段中

2 个答案:

答案 0 :(得分:2)

因此,根据您的描述,我认为您的列和行都是

columns = [
"view_grades",
"view_classes",
"change_grades",
"add_grades",
"delete_grades",
"change_classes",
"add_classes",
"delete_classes"]

rows = [
"student1",
"student2",
"teacher",
"principle"]

您要做的是将行设置为索引

df = pd.DataFrame(index=rows, columns=permissions)

print(df)
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+
|           | view_grades | view_classes | change_grades | add_grades | delete_grades | change_classes | add_classes | delete_classes |
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+
| student1  | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| student2  | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| teacher   | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| principle | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+

答案 1 :(得分:1)

这是您可以做的:

from collections import defaultdict


def json_to_csv():
    with open('C:/Users/Elitebook/Documents/GitHub/permissions.json') as json_file:
        # convert to python dict
        py_dict = json.load(json_file)

        # first get a list of all the values(permissions) from the dict, flatten the list and return only unique values
        # this is not necessary anymore since the code below automatically gets a list of unique permissions
        # but if you still want to to it this way it's quite possible
        # permissions = sorted(set([key for value in py_dict.itervalues() for key in value]))

        # create a dictionary of dictionaries in which to put values and populate it
        final = defaultdict(dict)

        # loop through the outer dictionary {'principle': ...}
        for k, v in py_dict.items():
            # loop through the inner list ['add_classes', 'change_classes' ...]
            for i in v:
                # create a key final['principle']['add_classes'] in the final dictionary
                # and set its value to True
                final[k][i] = True

        # This is what final looks like
        # defaultdict(<class 'dict'>,
        #     {'principle': {'add_classes': True,
        #                    'change_classes': True,
        #                    'delete_classes': True,
        #                    'view_classes': True,
        #                    'view_grades': True},
        #      'student1': {'view_classes': True, 'view_grades': True},
        #      'student2': {'view_classes': True, 'view_grades': True},
        #      'teacher': {'add_grades': True,
        #                  'change_grades': True,
        #                  'delete_grades': True,
        #                  'view_classes': True,
        #                  'view_grades': True}})

        # now create the dataframe
        # fillna basically replaces whatever is not available (eg. can student1 add_grades?) by False.
        df = pd.DataFrame(final).fillna(False)

输出:

                student1  student2  teacher  principle
add_classes        False     False    False       True
add_grades         False     False     True      False
change_classes     False     False    False       True
change_grades      False     False     True      False
delete_classes     False     False    False       True
delete_grades      False     False     True      False
view_classes        True      True     True       True
view_grades         True      True     True       True

如果您想要相反的方法,只需转置DataFrame:

df.T

输出:

           add_classes  add_grades  change_classes     ...       delete_grades  view_classes  view_grades
student1         False       False           False     ...               False          True         True
student2         False       False           False     ...               False          True         True
teacher          False        True           False     ...                True          True         True
principle         True       False            True     ...               False          True         True