这是我的首发df:
import numpy as np
import pandas as pd
df = pd.DataFrame(['alpha', 'beta'], columns = ['text'])
df
text
0 alpha
1 beta
这是我想要的最终结果:
text first second third
0 alpha alpha-first alpha-second alpha-third
1 beta beta-first beta-second beta-third
我编写了自定义函数parse()
,没有问题:
def parse(text):
return [text + ' first', text + ' second', text + ' third']
现在我尝试将parse()
应用于初始df,这是出现错误的地方:
1)如果我尝试以下方法:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']) # Create empty columns
df[['first', 'second', 'third']] = df.text.apply(parse)
我明白了:
ValueError: Must have equal len keys and value when setting with an ndarray
2)版本略有不同:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']).astype(object) # Create empty columns of "object" type
df[['first', 'second', 'third']] = df.text.apply(parse)
我明白了:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast
to indexing result of shape (3,2)
我哪里错了?
修改
我应该澄清parse()
本身在我试图解决的现实问题中是一个更复杂的功能。 (它需要一个段落,在其中找到3种特定类型的字符串,并将这些字符串输出为长度为3的列表)。在我上面的代码中,我对parse()
作为替代的一个有点随机的简单定义,以避免陷入与我得到的两个错误无关的细节中。
答案 0 :(得分:2)
无需apply
:
import pandas as pd
df = pd.DataFrame(['alpha', 'beta'], columns = ['text'])
for i in ['first', 'second', 'third']:
df[i] = df.text + '-' + i
# text first second third
# 0 alpha alpha-first alpha-second alpha-third
# 1 beta beta-first beta-second beta-third
通常,为您的计算选择的“流程类型”的层次结构应为:
pd.Series.apply
pd.DataFrame.apply
pd.DataFrame.iterrows
答案 1 :(得分:1)
这可以通过以下几种方式完成:
选项1:
def f(s):
return pd.DataFrame(np.repeat(s, 3).values.reshape(len(s), -1),
columns=['first','second','third']) \
.apply(lambda c: c+'-'+c.name)
In [183]: df[['first','second','third']] = f(df.text)
In [184]: df
Out[184]:
text first second third
0 alpha alpha-first alpha-second alpha-third
1 beta beta-first beta-second beta-third
答案 2 :(得分:1)
这里是pd.DataFrame.assign
的单行:
df.assign(**{x: df['text']+'-'+x for x in ['first', 'second', 'third']})
# text first second third
# 0 alpha alpha-first alpha-second alpha-third
# 1 beta beta-first beta-second beta-third
答案 3 :(得分:0)
检查一下:
istr.ignore(std::numeric_limits<std::streamsize>::max());
max = istr.gcount();