将字典的所有值(每个包含列表)附加到熊猫df

时间:2020-08-18 21:07:58

标签: pandas dictionary

我有一列,每一行包含一个带有多个键和值的python字典。每个值都是一个列表。索引[0]如下:

{'Paradigms': ['Agile Software Development',
  'Scrum',
  'DevOps',
  'Serverless Architecture'],
 'Platforms': ['Kubernetes',
  'Linux',
  'Windows',
  'Eclipse',
  'PagerDuty',
  'Apache2',
  'Docker',
  'AWS EC2',
  'Amazon Web Services (AWS)',
  'Sysdig',
  'Apache Kafka',
  'AWS Lambda',
  'Azure',
  'OpenStack'],
 'Storage': ['AWS S3',
  'MongoDB',
  'Cassandra',
  'MySQL',
  'PostgreSQL',
  'AWS DynamoDB',
  'Spring Data MongoDB',
  'AWS RDS',
  'MySQL/MariaDB',
  'Datadog',
  'Memcached'],
 'Languages': ['Java',
  'PHP',
  'SQL',
  'Bash',
  'Perl',
  'JavaScript',
  'Python',
  'C#',
  'Go'],
 'Frameworks': ['Ruby on Rails (RoR)',
  'AWS HA',
  '.NET',
  'Serverless Framework',
  'Selenium',
  'CodeIgniter',
  'Express.js'],
 'Other': ['Cisco',
  'Content Delivery Networks (CDN)',
  'Kubernetes Operations (Kops)',
  'Prometheus',
  'VMware ESXi',
  'Bash Scripting',
  'Scrum Master',
  'Infrastructure as Code',
  'Performance Tuning',
  'Serverless',
  'System Administration',
  'Linux System Administration',
  'Code Review'],
 'Libraries/APIs': ['Node.js',
  'Jenkins Pipeline',
  'jQuery',
  'React',
  'Selenium Grid'],
 'Tools': ['Jenkins',
  'Bitbucket',
  'GitHub',
  'AWS ECS',
  'AWS IAM',
  'Amazon CloudFront CDN',
  'Terraform',
  'AWS CloudFormation',
  'Git Flow',
  'Artifactory',
  'Nginx',
  'Grafana',
  'Zabbix',
  'Docker Compose',
  'AWS CLI',
  'AWS ECR',
  'Chef',
  'Jira',
  'Git',
  'Postfix',
  'MongoDB Shell',
  'Wowza',
  'Amazon SQS',
  'AWS SES',
  'Subversion (SVN)',
  'TeamCity',
  'Microsoft Visual Studio',
  'Google Kubernetes Engine (GKE)',
  'VMware ESX',
  'Fluentd',
  'Sumo Logic',
  'Slack',
  'Apache ZooKeeper',
  'AWS Fargate',
  'Ansible',
  'ELK (Elastic Stack)',
  'Microsoft Team Foundation Server',
  'Azure Kubernetes Service (AKS)']}

我只想获取值并将它们添加到新列中。

我尝试过:

# convert dict values to str
for index, row in toptal["skills"].items():
    for key, val in row.items():
        row.update({key: str(val)})

# reverse dict keys and values
for index, row in toptal["skills"].items():
    inv_dict = {v: k for k, v in row.items()}

# map inv_dict to new column
toptal["skills_list"] = toptal["skills"].apply(
    lambda x: {k for k, v in inv_dict.items()}
)

问题似乎出在lambda函数中的最后一个列表理解上。如何遍历字典的所有键值对(在该行内)以将每个值分配给该行?要获得此输出:

row 1: ['Agile Software Development','Scrum','DevOps', 'Serverless Architecture'], ['Kubernetes','Linux','Windows','Eclipse','PagerDuty','Apache2','Docker','AWS EC2','Amazon Web Services (AWS)','Sysdig','Apache Kafka','AWS Lambda','Azure','OpenStack'],['AWS S3','MongoDB','Cassandra','MySQL','PostgreSQL','AWS DynamoDB','Spring Data MongoDB','AWS RDS','MySQL/MariaDB','Datadog','Memcached']...

我已经能够将每个列表添加到连续的行中,但是我希望所有列表都包含在一行中。

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

IIUC,您可以尝试使用json_normalize

#dictionary given
d={'Paradigms': ['Agile Software Development', 'Scrum', 'DevOps', 'Serverless Architecture'], 'Platforms': ['Kubernetes', 'Linux', 'Windows', 'Eclipse', 'PagerDuty', 'Apache2', 'Docker', 'AWS EC2', 'Amazon Web Services (AWS)', 'Sysdig', 'Apache Kafka', 'AWS Lambda', 'Azure', 'OpenStack'], 'Storage': ['AWS S3', 'MongoDB', 'Cassandra', 'MySQL', 'PostgreSQL', 'AWS DynamoDB', 'Spring Data MongoDB', 'AWS RDS', 'MySQL/MariaDB', 'Datadog', 'Memcached'], 'Languages': ['Java', 'PHP', 'SQL', 'Bash', 'Perl', 'JavaScript', 'Python', 'C#', 'Go'], 'Frameworks': ['Ruby on Rails (RoR)', 'AWS HA', '.NET', 'Serverless Framework', 'Selenium', 'CodeIgniter', 'Express.js'], 'Other': ['Cisco', 'Content Delivery Networks (CDN)', 'Kubernetes Operations (Kops)', 'Prometheus', 'VMware ESXi', 'Bash Scripting', 'Scrum Master', 'Infrastructure as Code', 'Performance Tuning', 'Serverless', 'System Administration', 'Linux System Administration', 'Code Review'], 'Libraries/APIs': ['Node.js', 'Jenkins Pipeline', 'jQuery', 'React', 'Selenium Grid'], 'Tools': ['Jenkins', 'Bitbucket', 'GitHub', 'AWS ECS', 'AWS IAM', 'Amazon CloudFront CDN', 'Terraform', 'AWS CloudFormation', 'Git Flow', 'Artifactory', 'Nginx', 'Grafana', 'Zabbix', 'Docker Compose', 'AWS CLI', 'AWS ECR', 'Chef', 'Jira', 'Git', 'Postfix', 'MongoDB Shell', 'Wowza', 'Amazon SQS', 'AWS SES', 'Subversion (SVN)', 'TeamCity', 'Microsoft Visual Studio', 'Google Kubernetes Engine (GKE)', 'VMware ESX', 'Fluentd', 'Sumo Logic', 'Slack', 'Apache ZooKeeper', 'AWS Fargate', 'Ansible', 'ELK (Elastic Stack)', 'Microsoft Team Foundation Server', 'Azure Kubernetes Service (AKS)']}

#Create a dataframe with dictionaries like above

df=pd.DataFrame({'d':[d,d]})
print(df)
#                                                   d
#0  {'Paradigms': ['Agile Software Development', '...
#1  {'Paradigms': ['Agile Software Development', '...

#use json_normalize
print(pd.json_normalize(df['d']))
                                           Paradigms                                          Platforms                                            Storage                                          Languages                                         Frameworks                                              Other                                     Libraries/APIs                                              Tools
0  [Agile Software Development, Scrum, DevOps, Se...  [Kubernetes, Linux, Windows, Eclipse, PagerDut...  [AWS S3, MongoDB, Cassandra, MySQL, PostgreSQL...  [Java, PHP, SQL, Bash, Perl, JavaScript, Pytho...  [Ruby on Rails (RoR), AWS HA, .NET, Serverless...  [Cisco, Content Delivery Networks (CDN), Kuber...  [Node.js, Jenkins Pipeline, jQuery, React, Sel...  [Jenkins, Bitbucket, GitHub, AWS ECS, AWS IAM,...
1  [Agile Software Development, Scrum, DevOps, Se...  [Kubernetes, Linux, Windows, Eclipse, PagerDut...  [AWS S3, MongoDB, Cassandra, MySQL, PostgreSQL...  [Java, PHP, SQL, Bash, Perl, JavaScript, Pytho...  [Ruby on Rails (RoR), AWS HA, .NET, Serverless...  [Cisco, Content Delivery Networks (CDN), Kuber...  [Node.js, Jenkins Pipeline, jQuery, React, Sel...  [Jenkins, Bitbucket, GitHub, AWS ECS, AWS IAM,...

编辑:使用.values()仅获取字典的值:

df['d']=df['d'].apply(lambda x: list(x.values()))