我试图标记熊猫数据框中的句子,但遇到了一些麻烦
我知道这段代码仅能隐藏一行
TextBlob(df['H'][0]).words
但是当我尝试在for循环中应用它时,出现了错误
for i, row in df.H():
ifor_val = TextBlob(df['H'][i]).words
df.at[i,'ifor'] = H
错误消息: TypeError:“系列”对象不可调用
编辑:
data = {'H':['the quick brown fox jumps over the road', 'the weather is nice
today'], 'marks':[99, 98]}
df = pd.DataFrame(data)
所需
H marks
['the','quick','brown', 'fox'....] 99
['the','weather','is', 'nice'....] 98
解决方案:
df ['H'] = df ['H']。apply(word_tokenize) df ['H']。head()
答案 0 :(得分:0)
您可能想将函数应用于数据框中的每一行。在这种情况下,您可以使用lambda在整个数据框中每行应用一次函数。
假设H是您要定位的列,并且每一行都是您要发送到TextBlob的确切文本,则下面将添加一个名为'output'的列,这是TextBlob功能的结果
const routes: Routes = [
{
path: '',
redirectTo: 'dashboard',
pathMatch: 'full'
},
{
path: 'dashboard',
component: NewsComponent,
children: [
{ path: '', redirectTo: 'headlines', pathMatch: 'full' },
{ path: 'headlines', component: HeadLinesComponent }
]
}
]
答案 1 :(得分:0)
这给了我您想要的东西:
data = {'H':['the quick brown fox jumps over the road', 'the weather is nice today'], 'marks':[99, 98]}
df = pd.DataFrame(data)
print(df)
df2 = df.drop("H",axis=1).copy()
df2.insert(loc=0, column='H', value=[[] for x in range(df.shape[0])])
for index, row in df2.iterrows():
vals = df.loc[index,"H"].split()
for word in vals :
df2.loc[index,"H"].append(word)
print(df2)
答案 2 :(得分:0)
如果要遍历一列的索引-值(在这种情况下为字符串)对,则需要该列的iteritems()方法:
for i, s in df.H.iteritems():
pass # Do stuff with your values
最好添加新列,而不要覆盖旧列。