我对Python很陌生,我试图使用Pandas(在iPython Notebook,Python 3中)来组合三列。这是原始数据:
RegistrationID FirstName MiddleInitial LastName
1 John P Smith
2 Bill Missing Jones
3 Paul H Henry
我想要:
RegistrationID FirstName MiddleInitial LastName FullName 1 John P Smith Smith, John, P 2 Bill Missing Jones Jones, Bill 3 Paul H Henry Henry, Paul, H
我确定这绝对不是正确的做法,但这就是我在for循环中设置它的方法。不幸的是,它只是继续前进和永远不会完成。
%matplotlib inline
import pandas as pd
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))
reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)
for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName
else: break
然后想法为那些具有完整名称的人添加另一列(即包括MiddleInitial):
for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break
然后将它们组合起来,这样就没有空值。我到处寻找,但我无法弄明白。任何帮助将不胜感激,如果我违反了任何惯例,我会提前道歉,因为这是我的第一篇文章。
答案 0 :(得分:1)
This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].
df['Full Name'] = [
"{0}, {1} {2}"
.format(last, first, middle if middle != 'Missing' else "").strip()
for last, first, middle
in df[['LastName', 'FirstName', 'MiddleInitial']].values]
>>> df
RegistrationID FirstName MiddleInitial LastName Full Name
0 1 John P Smith Smith, John P
1 2 Bill Missing Jones Jones, Bill
2 3 Paul H Henry Henry, Paul H
The iterable_item is the array of values from the dataframe:
>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
['Jones', 'Bill', 'Missing'],
['Henry', 'Paul', 'H']], dtype=object)
So, per our list comprehension model:
>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]
I then format the string:
a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"
I use a ternary to check if the middle name is 'Missing', so:
middle if middle != "Missing" else ""
is equivalent to:
if middle == 'Missing':
middle = ""
Finally, I added .strip()
to remove the extra space in case the middle name is missing.
答案 1 :(得分:1)
All you need to do is add the columns:
>>> df.FirstName + ', ' + df.LastName + ', ' + df.FullName.str.replace(', Missing', '')
0 John, Smith, P
1 Bill, Jones, Missing
2 Paul, Henry, H
dtype: object
To add a new column, you could just write:
df['FullName'] = df.FirstName + ', ' + ...
(In Pandas, it is usually attempted to avoid loops and such.)