Question

我正在尝试在数据框中创建一个新列，其中包含相应行的字数。我正在寻找单词的总数，而不是每个不同单词的频率。我假设有一种简单/快捷的方式来完成这项常见任务，但是在谷歌搜索并阅读了一些SO帖子后1，2，3，{{3}我被困住了。我已经尝试过在链接的SO帖子中提出的解决方案，但是会收到很多属性错误。

words = df['col'].split()
df['totalwords'] = len(words)

结果

AttributeError: 'Series' object has no attribute 'split'

和

f = lambda x: len(x["col"].split()) -1
df['totalwords'] = df.apply(f, axis=1)

结果

AttributeError: ("'list' object has no attribute 'split'", 'occurred at index 0')

Answer 1

`str.split` + `str.len`

str.len适用于任何非数字列。

df['totalwords'] = df['col'].str.split().str.len()

`str.count`

如果你的单词是单空格分隔的，你可以简单地计算空格加1。

df['totalwords'] = df['col'].str.count(' ') + 1

列表理解

这比你想象的要快！

df['totalwords'] = [len(x.split()) for x in df['col'].tolist()]

Answer 2

以下是使用.apply()的方法：

df['number_of_words'] = df.col.apply(lambda x: len(x.split()))

示例

鉴于此df：

>>> df col 0 This is one sentence 1 and another

应用.apply()
后
df['number_of_words'] = df.col.apply(lambda x: len(x.split())) >>> df col number_of_words 0 This is one sentence 4 1 and another 2

注意：正如评论中所指出的那样，在this answer中，.apply不一定是最快的方法。如果速度很重要，最好使用@cᴏʟᴅsᴘᴇᴇᴅ's方法之一。

Answer 3

这是使用pd.Series.str.split和pd.Series.map的一种方式：

df['word_count'] = df['col'].str.split().map(len)

以上假设df['col']是一系列字符串。

示例：

df = pd.DataFrame({'col': ['This is an example', 'This is another', 'A third']})

df['word_count'] = df['col'].str.split().map(len)

print(df)

#                   col  word_count
# 0  This is an example           4
# 1     This is another           3
# 2             A third           2

Answer 4

来自寒冷的list和map数据

list(map(lambda x : len(x.split()),df.col))
Out[343]: [4, 3, 2]

Answer 5

`df ['count_words'] = df ['tweet']。apply（lambda x：len（x.split（）））

df ['count_words']。head（10）

`我正在进行Twitter情绪分析，对我来说效果很好。

计算每行的单词数

5 个答案:

`str.split` + `str.len`

`str.count`

列表理解

计算每行的单词数

5 个答案:

str.split + str.len

str.count

列表理解

`str.split` + `str.len`

`str.count`