Pandas:在从第一列拆分每一行时创建另一列

时间:2016-10-18 16:32:01

标签: python pandas

目标从第一列创建第二列

column1, column2
Hello World, #HelloWord
US Election, #USElection

我有一个简单的文件,有一列

columnOne
Hello World
US Election
Movie Night

我写了以下函数

>>> def newColumn(row):
...     r = "#" + "".join(row.split(" "))
...     return r

然后我按照pandas

创建了第二列
df['column2'] = df.apply (lambda row: newColumn(row),axis=1)

但我最终得到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 3972, in apply
    return self._apply_standard(f, axis, reduce=reduce)
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4064, in _apply_standard
    results[i] = func(v)
  File "<stdin>", line 1, in <lambda>
  File "<stdin>", line 2, in newColumn
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: ("'Series' object has no attribute 'split'", u'occurred at index 0')

所以我将拆分更改为以下内容:

r = "".join(row.str.split(" "))

但这没有帮助

4 个答案:

答案 0 :(得分:3)

这应该可以解决问题

df['new_column'] = df['old_column'].apply(lambda x: "#"+x.replace(' ', ''))

实施例

>>> names = ['Hello World', 'US Election', 'Movie Night']
>>> df = pd.DataFrame(data = names, columns=['Names'])
>>> df
     Names
0    Hello World
1    US Election
2    Movie Night

>>> df['Names2'] = df['Names'].apply(lambda x: "#"+x.replace(' ', ''))
>>> df
     Names         Names2
0    Hello World   #HelloWorld
1    US Election   #USElection
2    Movie Night   #MovieNight

答案 1 :(得分:3)

尝试列表comprehesion:

df = pandas.DataFrame({'columnOne': ['Hello World', 'US Election', 'Movie Night']})

df['column2'] = ['#' + item.replace(' ', '') for item in df.columnOne]

In [2]: df

enter image description here

答案 2 :(得分:3)

你的一般方法完全没问题,你只是遇到了一些问题。当您对整个数据帧使用apply时,它会将行或列传递给它正在应用的函数。在您的情况下,您不需要行或列 - 您希望第一列中每个单元格内的字符串。因此,您需要df.apply

,而不是运行df['columnOne'].apply

这就是我要做的事情:

import pandas as pd

df = pd.DataFrame(['First test here', 'Second test'], columns=['A'])

# Note that this function expects a string, and returns a string
def new_string(s):
    # Get rid of the spaces
    s = s.replace(' ','')
    # Add the hash
    s = '#' + s
    return s

# The, apply it to the first column, and save it in the second, new column
df['B'] = df['A'].apply(new_string)

或者,如果你真的想要一个单行:

df['B'] = df['A'].apply(lambda x: '#' + x.replace(' ',''))

答案 3 :(得分:3)

您可以使用str.replace作为评论MaxUSeries.replace使用参数regex=True,以空字符串替换所有空格:

df['column2'] = '#' + df.column1.str.replace('\s+','')
df['column3'] = '#' + df.column1.replace('\s+','', regex=True)

print (df)
       column1      column2      column3
0  Hello World  #HelloWorld  #HelloWorld
1  US Election  #USElection  #USElection