我需要编写一个辅助函数,可以在程序的其他地方应用来重新格式化字符串。
我的第一个函数process_DrugCount(dataframe)返回三个如下所示的数据框:
MemberID DSFS DrugCount
2 61221204 2- 3 months 1
8 30786520 1- 2 months 1
11 28420460 10-11 months 1
我的第二个函数replaceMonth(string)是一个帮助函数,它将重新格式化DSFS值(例如:" 2-3个月"到" 2_3")。
我所拥有的以下代码只能在process_DrugCount()下完成,而不是replacemonth()。 DrugCount_Y1.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True)
我将如何在replaceMonth()下重写它。这是我的所有代码:
def process_DrugCount(drugcount):
dc = pd.read_csv("DrugCount.csv")
sub_map = {'1' : 1, '2':2, '3':3, '4':4, '5':5, '6':6, '7+' : 7}
dc['DrugCount'] = dc.DrugCount.map(sub_map)
dc['DrugCount'] = dc.DrugCount.astype(int)
dc_grouped = dc.groupby(dc.Year, as_index=False)
DrugCount_Y1 = dc_grouped.get_group('Y1')
DrugCount_Y2 = dc_grouped.get_group('Y2')
DrugCount_Y3 = dc_grouped.get_group('Y3')
DrugCount_Y1.drop('Year', axis=1, inplace=True)
DrugCount_Y2.drop('Year', axis=1, inplace=True)
DrugCount_Y3.drop('Year', axis=1, inplace=True)
print DrugCount_Y1
a = DrugCount_Y1.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True) #WORKS HERE!
return (DrugCount_Y1,DrugCount_Y2,DrugCount_Y3)
# this function converts strings such as "1- 2 month" to "1_2"
def replaceMonth(string):
string.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True) #Doesn't change dash to underscore.
return a_new_string
答案 0 :(得分:0)
实际上你不需要特殊的功能,因为它已经存在 - replace():
In [32]: replacements = {
....: 'DSFS': {
....: r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'
....: },
....: 'DrugCount': {
....: r'\+': ''
....: }
....: }
In [33]: dc
Out[33]:
MemberID Year DSFS DrugCount
0 48925661 Y2 9-10 months 7+
1 90764620 Y3 8- 9 months 3
2 61221204 Y1 2- 3 months 1
In [34]: dc.replace(replacements, regex=True, inplace=True)
In [35]: dc['DrugCount'] = dc.DrugCount.astype(int)
In [36]: dc
Out[36]:
MemberID Year DSFS DrugCount
0 48925661 Y2 9_10 7
1 90764620 Y3 8_9 3
2 61221204 Y1 2_3 1
In [37]: dc.dtypes
Out[37]:
MemberID int64
Year object
DSFS object
DrugCount int32
dtype: object
答案 1 :(得分:0)
比这更容易。也许我没有问过这个问题。 我需要做的就是:
def replaceMonth(string):
replace_map = {'0- 1 month' : "0_1", "1- 2 months": "1_2", "2- 3 months": "2_3", "3- 4 months": '3_4', "4- 5 months": "4_5", "5- 6 months": "5_6", "6- 7 months": "6_7", \
"7- 8 months" : "7_8", "8- 9 months": "8_9", "9-10 months": "9_10", "10-11 months": "10_11", "11-12 months": "11_12"}
a_new_string = string.map(replace_map)
return a_new_string
只需重命名列名称。