一栏精美的作品

Question

我正在尝试将包含句子的Pandas DF转换为一个能够显示所有列和所有行中这些句子中单词数目的单词。

我尝试应用，转换，lambda函数和嵌套循环。

一栏精美的作品

dat.direction.str.split().str.len()

方法1失败

def token_count(x):
    if type(x) == str:
        return x.split().str.len()
    else:
        return 0

dat.apply(token_count)
dat.transform(token_count)

方法2失败

dat.apply(lambda x:x.str.split().str.len())
dat.apply(lambda x:x.split().str.len())
dat.transform(lambda x:x.str.split().str.len())
dat.transform(lambda x:x.split().str.len())

方法3失败（在嵌套for循环之前）

dat.iloc[1,3].split(" ").str.len()

一列输出

方法1的错误（不应为0）

....................

方法3错误

AttributeError: 'list' object has no attribute 'str'

预期产量

Answer 1

怎么样

import pandas as pd

df = pd.DataFrame({
    "col1": ["this is a sentence", "this is another sentence"],
    "col2": ["one more", "this is the last sentence"],
})

pd.concat([df[col].str.split().str.len() for col in df.columns], axis = 1)

Answer 2

`stack`

stack一维
做你的事
unstack返回

df.stack().str.split().str.len().unstack()

   col1  col2
0     4     2
1     4     5

改为使用`count`

df.stack().str.count('\s+').unstack() + 1

`applymap`

df.applymap(lambda s: len(s.split()))

`apply`

df.apply(lambda s: s.str.split().str.len())

设置

Thanks to Ian

df = pd.DataFrame({
    "col1": ["this is a sentence", "this is another sentence"],
    "col2": ["one more", "this is the last sentence"],
})

Answer 3

您可以使用第一种方法遍历数据框中的每一列。

out = pd.DataFrame(index=dat.index)
for col in dat:
    out[col] = dat[col].str.split().str.len()

如何转换Pandas DF以显示原始DF中的令牌计数？

一栏精美的作品

方法1失败

方法2失败

方法3失败（在嵌套for循环之前）

一列输出

方法1的错误（不应为0）

方法3错误

预期产量

3 个答案:

`stack`

改为使用`count`

`applymap`

`apply`

设置

如何转换Pandas DF以显示原始DF中的令牌计数？

一栏精美的作品

方法1失败

方法2失败

方法3失败（在嵌套for循环之前）

一列输出

方法1的错误（不应为0）

方法3错误

预期产量

3 个答案:

stack

改为使用count

applymap

apply

设置

`stack`

改为使用`count`

`applymap`

`apply`