目标从第一列创建第二列
column1, column2
Hello World, #HelloWord
US Election, #USElection
我有一个简单的文件,有一列
columnOne
Hello World
US Election
Movie Night
我写了以下函数
>>> def newColumn(row):
... r = "#" + "".join(row.split(" "))
... return r
然后我按照pandas
创建了第二列df['column2'] = df.apply (lambda row: newColumn(row),axis=1)
但我最终得到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 3972, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4064, in _apply_standard
results[i] = func(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 2, in newColumn
File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 2360, in __getattr__
(type(self).__name__, name))
AttributeError: ("'Series' object has no attribute 'split'", u'occurred at index 0')
所以我将拆分更改为以下内容:
r = "".join(row.str.split(" "))
但这没有帮助
答案 0 :(得分:3)
这应该可以解决问题
df['new_column'] = df['old_column'].apply(lambda x: "#"+x.replace(' ', ''))
实施例
>>> names = ['Hello World', 'US Election', 'Movie Night']
>>> df = pd.DataFrame(data = names, columns=['Names'])
>>> df
Names
0 Hello World
1 US Election
2 Movie Night
>>> df['Names2'] = df['Names'].apply(lambda x: "#"+x.replace(' ', ''))
>>> df
Names Names2
0 Hello World #HelloWorld
1 US Election #USElection
2 Movie Night #MovieNight
答案 1 :(得分:3)
尝试列表comprehesion:
df = pandas.DataFrame({'columnOne': ['Hello World', 'US Election', 'Movie Night']})
df['column2'] = ['#' + item.replace(' ', '') for item in df.columnOne]
In [2]: df
答案 2 :(得分:3)
你的一般方法完全没问题,你只是遇到了一些问题。当您对整个数据帧使用apply时,它会将行或列传递给它正在应用的函数。在您的情况下,您不需要行或列 - 您希望第一列中每个单元格内的字符串。因此,您需要df.apply
。
df['columnOne'].apply
这就是我要做的事情:
import pandas as pd
df = pd.DataFrame(['First test here', 'Second test'], columns=['A'])
# Note that this function expects a string, and returns a string
def new_string(s):
# Get rid of the spaces
s = s.replace(' ','')
# Add the hash
s = '#' + s
return s
# The, apply it to the first column, and save it in the second, new column
df['B'] = df['A'].apply(new_string)
或者,如果你真的想要一个单行:
df['B'] = df['A'].apply(lambda x: '#' + x.replace(' ',''))
答案 3 :(得分:3)
您可以使用str.replace
作为评论MaxU或Series.replace
使用参数regex=True
,以空字符串替换所有空格:
df['column2'] = '#' + df.column1.str.replace('\s+','')
df['column3'] = '#' + df.column1.replace('\s+','', regex=True)
print (df)
column1 column2 column3
0 Hello World #HelloWorld #HelloWorld
1 US Election #USElection #USElection