有关合并熊猫数据框的问题

时间:2020-11-06 04:39:13

标签: python pandas dataframe

我有两个来自.csv文件的数据框,并且基于它们共享的通用col名称(“ NAME”)将它们组合在一起,而我想做的是在另一列上显示两个因素的差异。但是我得到的错误是

secret_number = 777
guess_number = int(input("Enter a number: "))

while True:
    if guess_number == secret_number:
        print("Well done, muggle! You are free now.")
        break
    else:
        print("Ha ha! You're stuck in my loop!")
        guess_number = int(input("Enter a number: "))

这是我的代码:

Traceback (most recent call last):
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '\ufeff2010'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\nhoss\OneDrive\Desktop\Senior_Project\responserate.py", line 22, in <module>
    combinedresponse['DIFFERENCE'] = combinedresponse['\ufeff2010'] - combinedresponse['2000']
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\frame.py", line 2902, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc
    raise KeyError(key) from err
KeyError: '\ufeff2010'
[Finished in 0.933s]

CSV文件:

responserate2010.csv


import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import string

response2000 = pd.read_csv(r'C:\Users\nhoss\OneDrive\Desktop\Senior_Project\2000ResponseRates.csv', skiprows=0)

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

response2010 = pd.read_csv(r'C:\Users\nhoss\OneDrive\Desktop\Senior_Project\responserate2010.csv', skiprows=0 )
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

combinedresponse = response2000.merge(response2010, on='NAME', how='inner')
combinedresponse['DIFFERENCE'] = combinedresponse['2010'] - combinedresponse['2000']
print(combinedresponse)

2000ResponseRates.csv:

2010,NAME,STATE,COUNTY_ID
52,Allegany County,36,3
64,Bronx County,36,5
68,Broome County,36,7
57,Cattaraugus County,36,9
64,Cayuga County,36,11
61,Chautauqua County,36,13
71,Chemung County,36,15
58,Chenango County,36,17
62,Clinton County,36,19
50,Columbia County,36,21
67,Cortland County,36,23
50,Delaware County,36,25
66,Dutchess County,36,27
70,Erie County,36,29
63,Fulton County,36,35
52,Essex County,36,31
59,Franklin County,36,33

1 个答案:

答案 0 :(得分:0)

请尝试以下操作:

combinedresponse  = pd.merge(respose2000, response2010, on="NAME", how="inner)
combinedresponse['DIFFERENCE'] = combinedresponse['2010'] - combinedresponse['2000']