我想用另一个文件提供的名称来重命名文件的标题。这个代表标题中可能包含的名称与所需名称之间的映射:
例如,以下是(aggregate.csv
)的列,需要更改:
TEXT,# 1 - A, # 12 - B,# 13 - C,# 3 - D
a, 1, 1, 1, 2
b, 1, 1, 1, 2
c, 1, 1, 1, 2
d, 1, 1, 1, 2
和另一个在文件名称和所需名称之间映射的文件mapping.csv
old,new
A,A
B,A
C,A
D,D
E,D
F,D
G,G
H,G
I,G
可以看到,标头中的名称与old
列不完全匹配,而是可能包括在内。确实,它们在# number - NAME
文件中之前都具有aggregate.csv
模式,而在NAME
中它们只有mapping.csv
。
目前我尝试过:
file = 'mapping.csv'
dictionary = pd.read_csv(file)
header_map = dictionary.set_index("old").to_dict()["new"]
df = pd.read_csv("aggregate.csv")
df = df.rename(columns in header_map)
由于列不完全相同,当词典中的名称相同时,如何添加。
直到现在,它的功能都运行良好:
def rename_columns(self,df, dictionary):
"""
Rename the columns with the given dictionary
Maybe we don't need the header map file.
Probably it doesn't work on every dataframe depending on the csv file header
Args:
df: the dataframe to crunch
dictionary: the former names mapped with the new ones.
Returns:
df_sum: The dataframe with the weird column names renamed
"""
print("rename_columns")
dictionary = pd.read_csv(os.path.join(os.getcwd(),dictionary))
header_map = dictionary.set_index("old").to_dict()["new"]
# Renombra eliminando el patron '#X - '
df = df.rename(columns=lambda x: re.sub(r'#[0-9]* - (.*)',r'\1',x))
# Usa el archivo de mapeo para renombrar
df = df.rename(columns = header_map)
df_sum = df.T.reset_index().groupby("index").sum().T
return df_sum
但使用下一个csv:
Map Level, Precinct ID, Precinct Name,#1 - Christian-Democratic Movement,#1 - Georgian Dream,#1 - Giorgi Margvelashvili,#1 - Mikheil Saakashvili,#1 - United National Movement,#10 - Georgian Group,#10 - Giorgi Liluashvili,#10 - Self-governance to People,#10 - Traditionalists - Our Georgia and Women's Party,#10 - United Democratic Movement,#11 - Greens Party,#11 - Nugzar Avaliani,#11 - People's Party,#11 - Sportsman's Union,#12 - Future Georgia,#12 - Levan Chachua,#12 - National Party of Radical Democrats of Georgia,#13 - Akaki Asatiani,#13 - Freedom Party,#13 - Giorgi Chikhladze,#13 - Merab Kostava Society,#13 - Teimuraz Mzhavia,#15 - Public Movement,#16 - Labour Council of Georgia,#16 - Mamuka Melikishvili,#17 - Nestan Kirtadze,#18 - Avtandil Margiani,#18 - Kartlos Gharibashvili,#18 - Tamaz Bibiluri,#2 - Christian-Democratic Movement,#2 - Georgian Group,#2 - Levan Gachechiladze,#2 - Nino Burjanadze,#2 - Republican party,#2 - United National Movement,#21 - Mikheil Saluashvili,#22 - Mamuka Chokhonelidze,#23 - Teimuraz Bobokhidze,#3 - Alliance of Patriots,#3 - Arkadi (Badri) Patarkatsishvili,#3 - Christian-Democratic Movement,#3 - Davit Bakradze,#3 - European Georgia,#3 - National Council,#3 - United Communist Party,#3 - United Opposition,#3 - We Ourselves,#4 - Alliance for Georgia,#4 - Alliance of Patriots,#4 - European Georgia,#4 - Labour,#4 - New Rights,#4 - Republican party,#4 - Shalva Natelashvili,#4 - Traditionalists - Our Georgia and Women's Party,#4 - United Communist Party,#5 - Davit Gamkrelidze,#5 - Democratic Movement - Free Georgia,#5 - Free Georgia,#5 - Giorgi Targamadze,#5 - Industry Will Save Georgia,#5 - Labour,#5 - New Rights,#5 - Right Wing Alliance Topadze Industrialists,#6 - Christian Democratic Alliance,#6 - Christian-Democratic Movement,#6 - Free Georgia,#6 - Georgian Group,#6 - Georgian Politics,#6 - Giorgi (Gia) Maisashvili,#6 - Koba Davitashvili,#6 - Labour,#6 - Movement for Fair Georgia,#6 - National Party of Radical Democrats of Georgia,#6 - Our Country,#6 - Sportsman's Union,#6 - Tortladze Democratic Party,#6 - Unity Hall,#7 - Christian Democratic Alliance,#7 - Future Georgia,#7 - Irina Sarishvili-Chanturia,#7 - Labour,#7 - Movement for Fair Georgia,#7 - National Forum,#7 - Non-Parliamentary Opposition,#7 - Sergo Javakhidze,#8 - Freedom Party,#8 - Georgian Group,#8 - Labour Council of Georgia,#8 - Merab Kostava Society,#8 - National Democratic Party of Georgia,#8 - New Rights,#8 - Nino Chanishvili,#8 - People's Party,#8 - Public Movement,#8 - Right Wing Alliance Topadze Industrialists,#8 - Sportsman's Union,#8 - United Communist Party,#8 - Way of Georgia,#9 - Armed Veterans Patriots,#9 - Freedom Party,#9 - Our Country,#9 - Sportsman's Union,#9 - Zurab Kharatishvili,Armed Veterans Patriots,Average votes per minute (08:00-12:00),Average votes per minute (12:00-15:00),Average votes per minute (12:00-17:00),Average votes per minute (15:00-20:00),Average votes per minute (17:00-20:00),Christian Democrats,Election,For United Georgia,Freedom Party,Future Georgia,Georgia,Georgian Party,Georgian Unity and Development Party,Greens Party,In the Name of the Lord,Initiative Group,Invalid Ballots (%),Labour,Labour Council of Georgia,Leftist Alliance,Lord Our Righteousness,Mamulishvili,Merab Kostava Society,More Ballots Than Votes (#),More Votes Than Ballots (#),National Democratic Party of Georgia,National Forum,National Party of Radical Democrats of Georgia,New Christian Democrats,New Rights,Nikoloz Ivanishvili Public Democrats,Our Country,Our Georgia,Overall Results,Party of Future,Party of People,People's Movement,People's Party,Progressive Democratic Movement,Public Alliance of Whole Georgia,Reformers,Republican party,Self-governance to People,Socialist Workers Party,Solidarity,Sportsman's Union,State for the People,Total Voter Turnout (#),Total Voter Turnout (%),Union of Georgian Traditionalists,United Communist Party,Unity - New Georgia,Way of Georgia
Precinct,1,83-1,51.28,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30.77,,,,,,,,,,,,,,17.95,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2008 Adjara Supreme Council - Majoritarian Re-run,,,,,,,,,,,,,,,,,,,,,,,,,,,Christian-Democratic Movement,,,,,,,,,,,,,,39,4.5,,,,
...
它返回:
we are cleaning file : C:\Users\antoi\Documents\Programming\Richmond\data/raw/aggregated\Khelvachauri-aggregated.csv
rename_colummns
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1505, in na_op
result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py", line 208, in evaluate
return _evaluate(op, op_str, a, b, **eval_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py", line 123, in _evaluate_numexpr
result = _evaluate_standard(op, op_str, a, b)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py", line 68, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for +: 'int' and 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1529, in safe_na_op
return na_op(lvalues, rvalues)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1507, in na_op
result = masked_arith_op(x, y, op)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1009, in masked_arith_op
com.values_from_object(yrav[mask]))
TypeError: unsupported operand type(s) for +: 'int' and 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\antoi\Documents\Programming\Richmond\scrapper.py", line 232, in clean_directory
self.clean_csv(df, fname)
File "C:\Users\antoi\Documents\Programming\Richmond\scrapper.py", line 251, in clean_csv
df = self.merge_sum_similar(df)
File "C:\Users\antoi\Documents\Programming\Richmond\scrapper.py", line 349, in merge_sum_similar
df_sum['New Right'] = df_sum['New Rights'] + df_sum['New Right']
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1583, in wrapper
result = safe_na_op(lvalues, rvalues)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1533, in safe_na_op
lambda x: op(x, rvalues))
File "pandas/_libs/algos.pyx", line 690, in pandas._libs.algos.arrmap
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1533, in <lambda>
lambda x: op(x, rvalues))
TypeError: unsupported operand type(s) for +: 'int' and 'str'