我有一堆数据文件,包含列'姓名'性别''计数'每年一个文件。我需要连接所有文件一段时间,总结所有唯一名称的所有计数,并添加一个辅音量的新列。我无法从姓名'中提取字符串值。我该如何实现呢? 这是我的代码:
import os
import re
import pandas as pd
PATH = ...
def consonants_dynamics (years):
names_by_year = {}
for year in years:
names_by_year[year] = pd.read_csv(PATH+"\\yob{}.txt".format(year), names =['Names', 'Gender', 'Count'])
names_all = pd.concat(names_by_year, names=['Year', 'Pos'])
dynamics = names_all.groupby('Names').sum().sort_values(by='Count', ascending=False).unstack('Names')
dynamics['Consonants'] = dynamics.apply(count_vowels(dynamics.Names), axis = 1)
return dynamics.head(10)
def count_vowels (name):
vowels = re.compile('A|E|I|O|U|a|e|i|o|u')
return len(name) - len (vowels.findall(name))
如果我运行类似
的话a = consonants_dynamics(i for i in range (1900, 2001, 10))
我收到以下错误消息
<ipython-input-9-942fc155267e> in consonants_dynamcis(years)
...
---> 12 dynamics['Consonants'] = dynamics.apply(count_vowels(dynamics.Names), axis = 1)
AttributeError: 'Series' object has no attribute 'Names'
我尝试了各种方法,但都失败了。怎么办呢?
答案 0 :(得分:1)
在执行取消堆栈后,您将动态转换为不再具有名称列dynamics.Names
的系列对象。我认为应该通过删除.unstack('Names')
之后使用dynamics.index:
dynamics['Consonants'] = dynamics.reset_index()['Names'].apply(count_vowels)
答案 1 :(得分:1)
转换index
to_series
并应用功能:
print (dynamics)
Count
Names
James 2
John 3
Robert 10
def count_vowels (name):
vowels = re.compile('A|E|I|O|U|a|e|i|o|u')
return len(name) - len (vowels.findall(name))
dynamics['Consonants'] = dynamics.index.to_series().apply(count_vowels)
使用 编辑: 没有str.len
无功能的解决方案,str.count
仅pat = 'A|E|I|O|U|a|e|i|o|u'
s = dynamics.index.to_series()
dynamics['Consonants_new'] = s.str.len() - s.str.count(pat)
print (dynamics)
Count Consonants_new Consonants
Names
James 2 3 3
John 3 3 3
Robert 10 4 4
to_series
的解决方案会将as_index=False
添加到groupby
以便返回DataFrame
:names_all = pd.DataFrame({
'Names':['James','James','John','John', 'Robert', 'Robert'],
'Count':[10,20,10,30, 80,20]
})
dynamics = names_all.groupby('Names', as_index=False).sum()
.sort_values(by='Count', ascending=False)
pat = 'A|E|I|O|U|a|e|i|o|u'
s = dynamics.index.to_series()
dynamics['Consonants'] = dynamics['Names'].str.len() - dynamics['Names'].str.count(pat)
print (dynamics)
Names Count Consonants
2 Robert 100 4
1 John 40 3
0 James 30 3