Question

在连续无st的情况下计算熊猫字符串系列中的项目数时遇到问题。

当每行有一个或多个项目时，我能够计算单词的数量。但是，如果该行没有值（运行时为空字符串） pd。['mytext']。str.split（'，'）），我也要一个。

这些答案对我不起作用Answer 1 to a solution which gives one for an empty string Answer 2 to a solution which gives one for an empty string。

我该如何在熊猫一号班轮中处理呢？预先感谢。

以第一个答案为例：

df = pd.DataFrame(['one apple','','box of oranges','pile of fruits outside', 'one banana', 'fruits'])
df.columns = ['fruits']

已验证的答案是

count = df['fruits'].str.split().apply(len).value_counts()
count.index = count.index.astype(str) + ' words:'
count.sort_index(inplace=True)
count

哪个给

Out[13]:  
0 words:    1
1 words:    1
2 words:    2
3 words:    1
4 words:    1
Name: fruits, dtype: int64

我想要第二个字符串为零，但尝试的每个解决方案都给了我一个。

Answer 1

使用str.split并用str.len计算元素：

df['wordcount'] = df.fruits.str.split().str.len()
print(df)
                   fruits  wordcount
0               one apple          2
1                                  0
2          box of oranges          3
3  pile of fruits outside          4
4              one banana          2
5                  fruits          1

将' '替换为','作为您的实际数据。

Answer 2

当您使用split()时，空字符串将返回空列表，但是，当您使用split(',')时，空字符串将返回具有空字符串的列表。这就是为什么该示例不适用于您的解决方案的原因。

您可以尝试以下操作：首先，根据您的示例，用逗号分隔字符串，我认为这是您的情况。然后，如果split返回带有空字符串的列表，函数将返回0，否则返回带有单词的列表长度。

pd.Series(['mytext', '']).str.split(',').apply(lambda x: 0 if x==[''] else len(x))

Answer 3

在您的问题中，您指的是str.split(',')，但示例仅针对str.split()。该函数根据您是否有参数而具有不同的行为。

您实际上是在尝试做什么？

当获取字符串中的单词数时，pandas系列中的空字符串计为1

3 个答案: