Question

我对Python很陌生，我试图使用Pandas（在iPython Notebook，Python 3中）来组合三列。这是原始数据：

       RegistrationID  FirstName  MiddleInitial   LastName    
           1              John       P             Smith    
           2              Bill       Missing       Jones   
           3              Paul       H             Henry

我想要：

   RegistrationID FirstName MiddleInitial   LastName    FullName
     1              John       P             Smith   Smith, John, P 
     2              Bill       Missing       Jones   Jones, Bill 
     3              Paul       H             Henry   Henry, Paul, H

我确定这绝对不是正确的做法，但这就是我在for循环中设置它的方法。不幸的是，它只是继续前进和永远不会完成。

%matplotlib inline
import pandas as pd

from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
   reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName 
else: break

然后想法为那些具有完整名称的人添加另一列（即包括MiddleInitial）：

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
   reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break

然后将它们组合起来，这样就没有空值。我到处寻找，但我无法弄明白。任何帮助将不胜感激，如果我违反了任何惯例，我会提前道歉，因为这是我的第一篇文章。

Answer 1

This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].

df['Full Name'] = [
   "{0}, {1} {2}"
   .format(last, first, middle if middle != 'Missing' else "").strip() 
   for last, first, middle 
   in df[['LastName', 'FirstName', 'MiddleInitial']].values]

>>> df
   RegistrationID FirstName MiddleInitial LastName      Full Name
0               1      John             P    Smith  Smith, John P
1               2      Bill       Missing    Jones    Jones, Bill
2               3      Paul             H    Henry  Henry, Paul H

The iterable_item is the array of values from the dataframe:

>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
       ['Jones', 'Bill', 'Missing'],
       ['Henry', 'Paul', 'H']], dtype=object)

So, per our list comprehension model:

>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]

I then format the string:

a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"

I use a ternary to check if the middle name is 'Missing', so:

middle if middle != "Missing" else ""

is equivalent to:

if middle == 'Missing':
    middle = ""

Finally, I added .strip() to remove the extra space in case the middle name is missing.

Answer 2

All you need to do is add the columns:

>>> df.FirstName + ', ' + df.LastName + ', ' + df.FullName.str.replace(', Missing', '')
0          John, Smith, P
1    Bill, Jones, Missing
2          Paul, Henry, H
dtype: object

To add a new column, you could just write:

df['FullName'] = df.FirstName + ', ' + ...

(In Pandas, it is usually attempted to avoid loops and such.)

使用for循环连接Pandas中的列

2 个答案: