如何从一列到第二列的值与多个值匹配

时间:2019-02-28 09:08:12

标签: python pandas

我有一个数据框:

Name  Dept
abc   Genteic|Biology|Chemical Engineering
def   Physics|Chemical Engineering|Astrophysics
xyz   Chemical Engineering|Astrophysics
klm   Biology|Astrophysics
nop   Chemical Engineering|Astrophysics

第一列包含名称,第二列显示与之关联的各个部门。我想知道每个部门工作的人数。 例如:在生物学部门,有多少人与之相关。 我这样做的代码是:

import  pandas as pd
import json
import requests
from requests.exceptions import ConnectionError
from requests.exceptions import ReadTimeout
import csv

def author_name(dataframe):
      response = get_url(term)
      return response

def get_url(term):
print(term)
response = resp.content
data = json.loads(response)
print(data) 

try:
    if data['author-retrieval-response']['subject-areas']['subject-area'] != 'null':
        myvar = data['author-retrieval-response']['subject-areas']['subject-area']['@abbrev']
        myvar = myvar.split('|')

    else:
        data['author-retrieval-response']['subject-areas']['subject-area'] = 'null'
        auth_empty =  data['author-retrieval-response']['subject-areas']['subject-area']['@abbrev']
        print(auth_empty)
except:
    pass

if __name__ =='__main__': 

out = open('out.csv', 'w',encoding='utf-8', newline="\n")
csvwriter = csv.writer(out)
header = ['Scopus ID', 'Title', 'Abstract', 'Affilaition', 'Authors', 
'Citation', 'Pub_Date']       

dataframe = pd.read_csv('author.csv', usecols='auth_name')
for i, row in dataframe.iterrows():
      term = (str(row[0]))
      response = author_name(dataframe)
      csvwriter.writerow(response)

任何帮助将不胜感激。 谢谢!!

1 个答案:

答案 0 :(得分:1)

我给您编写了一个非常简单的python脚本,它确实可以满足您的要求。 我忽略了输入文件是一个csv文件的事实,并且确实存在用于对其进行解析的库。以下只是一个快速而肮脏的解决方案,以提示您正确的方向。我建议您改进此代码段:

  • 使用csv库处理文件
  • 为变量使用一种字典 编辑: 已经完成
  • 尝试摆脱字符串比较(将主题用作词典的键) 编辑: 已经完成

input.csv

trigger()

main.py

abc   Genteic|Biology|Chemical Engineering
def   Physics|Chemical Engineering|Astrophysics
xyz   Chemical Engineering|Astrophysics
klm   Biology|Astrophysics
nop   Chemical Engineering|Astrophysics

调用counters = {"Biology":0, "Genteic":0, "Chemical Engineering":0, "Physics":0, "Astrophysics":0} csv_file = open("input.csv", "r") for line in csv_file.read().splitlines(): arr=line.split(" ") name=arr[0] professions=arr[1] for subj in professions.split("|"): counters[subj] += 1 csv_file.close() print("There are %s teachers working in Biology" % counters["Biology"]) print("There are %s teachers working in Genteic" % counters["Genteic"]) print("There are %s teachers working in Chemical Engineering" % counters["Chemical Engineering"]) print("There are %s teachers working in Physics" % counters["Physics"]) print("There are %s teachers working in Astrophysics" % counters["Astrophysics"]) 会导致:

python3 main.py