Question

我试图标记熊猫数据框中的句子，但遇到了一些麻烦

我知道这段代码仅能隐藏一行

TextBlob(df['H'][0]).words

但是当我尝试在for循环中应用它时，出现了错误

for i, row in df.H():
ifor_val = TextBlob(df['H'][i]).words
df.at[i,'ifor'] = H

错误消息： TypeError：“系列”对象不可调用

编辑：

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice 
today'], 'marks':[99, 98]} 
df = pd.DataFrame(data)

所需

H                                  marks
['the','quick','brown', 'fox'....]   99
['the','weather','is', 'nice'....]   98

解决方案：

df ['H'] = df ['H']。apply（word_tokenize） df ['H']。head（）

Answer 1

您可能想将函数应用于数据框中的每一行。在这种情况下，您可以使用lambda在整个数据框中每行应用一次函数。

假设H是您要定位的列，并且每一行都是您要发送到TextBlob的确切文本，则下面将添加一个名为'output'的列，这是TextBlob功能的结果

const routes: Routes = [
  {
    path: '',
    redirectTo: 'dashboard',
    pathMatch: 'full'
  },
  {
    path: 'dashboard',
    component: NewsComponent,
    children: [
      { path: '', redirectTo: 'headlines', pathMatch: 'full' },
      { path: 'headlines', component: HeadLinesComponent }
    ]
  }
]

Answer 2

这给了我您想要的东西：

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice today'], 'marks':[99, 98]} 
df = pd.DataFrame(data) 

print(df)


df2 = df.drop("H",axis=1).copy()


df2.insert(loc=0, column='H', value=[[] for x in range(df.shape[0])])

for index, row in df2.iterrows():
    vals = df.loc[index,"H"].split()

    for word in vals : 
        df2.loc[index,"H"].append(word) 

print(df2)

Answer 3

如果要遍历一列的索引-值（在这种情况下为字符串）对，则需要该列的iteritems（）方法：

for i,  s in df.H.iteritems():
    pass #  Do stuff with your values

最好添加新列，而不要覆盖旧列。

标记数据框中的每一行-循环不起作用

3 个答案: