我正在编写一个对人名进行排序的脚本。我有这个工作使用csv模块,但因为这将绑定到一个更大的熊猫项目,我想我会转换它。
我需要将单个名称字段拆分为第一个,中间和最后一个字段。原始字段的名字首先是。例如:Richard Wayne Van Dyke。
我拆分了名字,但希望“Van Dyke”成为姓氏。
这是我的csv模块的代码:
with open('inputfil.csv') as inf:
docs = csv.reader(inf)
next(ccaddocs, None)
for i in docs:
#print i
fullname = i[1]#it's the second column in the input file
namelist =fullname.split(' ')
firstname = namelist[0]
middlename = namelist[1]
if len(namelist) == 2:
lastname = namelist[1]
middlename = ''
elif len(namelist) == 3:
lastname = namelist[2]
elif len(namelist) == 4:
lastname = namelist[2] + " " + namelist[3] #gets Van Dyke in lastname
print "First: " + firstname + " middle: " + middlename + " last: " + lastname
这是我正在努力解决的基于熊猫的代码:
df = pd.DataFrame({'Name':['Richard Wayne Van Dyke','Gary Del Barco','Dave Allen Smith']})
df = df.fillna('')
df =df.astype(unicode)
splits = df['Name'].str.split(' ', expand=True)
df['firstName'] = splits[0]
if splits[2].notnull and splits[3].isnull:#this works for Bret Allen Cardwell
df['lastName'] = splits[2]
df['middleName'] = splits[1]
print "Case 1: First: " + df['firstName'] + " middle: " +df['middleName'] + " last: " + df['lastName']
elif splits[2].all() == 'Del':#trying to get last name of "Del Barco"
print 'del'
df['middleName'] = ''
df['lastName'] = splits[2] + " " + splits[3]
print "Case 2: First: " + df['firstName'] + " middle: " +df['middleName'] + " last: " + df['lastName']
elif splits[3].notnull: #trying to get last name of "Van Dyke"
df['middleName'] = splits[1]
df['lastName'] = splits[2] + " " + splits[3]
print "Case 3: First: " + df['firstName'] + " middle: " +df['middleName'] + " last: " + df['lastName']
我缺少一些基本的东西。
答案 0 :(得分:0)
library(XML)
doc <- xmlParse("Input.xml")
stringdata <- t(xpathSApply(doc, "//String", xmlAttrs))
df <- data.frame(stringdata, stringsAsFactors = FALSE)
# CONVERT CHARACTER COLUMNS TO NUMERIC
df[, c(1,3:6)] <- sapply(df[, c(1,3:6)], function(x) as.numeric(x))
head(df)
# WC CONTENT HEIGHT WIDTH VPOS HPOS
# 1 0.8520000 SHELL 30 92 472 902
# 2 0.5462500 MAATVELD 32 150 475 1016
# 3 0.5287500 RIJKSWEG 34 150 511 901
# 4 0.2966667 A20 31 55 515 1073
# 5 0.4427273 NIEUWERKERK 36 207 550 900
# 6 0.2633333 A/D 31 54 557 1130