Question

编辑：@Kyle Petryszak提供了我能想到的最佳解决方案

我修改为实际计算\n，因为似乎Twitter也将其计算在内

    from emoji import UNICODE_EMOJI

    num_emoji = sum(tweets.count(emoji) for emoji in UNICODE_EMOJI) # accurately count and track emoji
    ignored_chars = UNICODE_EMOJI.copy() # thanks to https://stackoverflow.com/q/56214183/11456464

    num_other = sum(0 if char in ignored_chars else 1 for char in tweet)
    print(num_emoji, num_other, str((num_emoji * 2) + num_other + 2)) # not sure what exactly

OP

我有一个这样的字符串（tweet）：

def construct_tweet(pihole, sys):
    tweet = ': ' + pihole[0]
    tweet += '\n⁉: ' + pihole[1]
    tweet += '\n: ' + pihole[2]
    tweet += '\n⁉⏭: ' + pihole[3]
    tweet += '\n⁉: ' + pihole[4]
    tweet += '\n: ' + pihole[5]
    tweet += '\n: ' + pihole[6]
    tweet += '\n⌛: ' + pihole[7]
    tweet += '\n⚖️x̅: ' + sys[1]
    tweet += '\n: ' + sys[2]
    tweet += '\n: ' + sys[3]
    tweet += '\n: ' + sys[4]
    tweet += '\n: ' + sys[5]
    tweet += '\n️⏳: ' + sys[0]
    # print(tweet) # always print tweet to console so we can see the output locally
    return tweet

生成的版本始终与+/- 1-3个字符以下的版本相同：

: 811,593
⁉: 32,143
: 18,527|57.64%
⁉⏭: 8,805
⁉: 4,811
: 5
: 2
⌛: 2019-05-19 08:37
⚖️x̅: 0.0, 0.0, 0.0
: 460M/1G|37.5%
: ens4, tun0, tun1
: 8G/28G|28.57%
: Linux-5.0.0-1006-gcp-x86_64-with-Ubuntu-19.10-eoan
️⏳: 2019-05-19 03:40

所有人都告诉我只有244个字符：

28个表情符号

⁉
⁉⏭⁉

⌛️x̅

️⏳

和其他216个字符（计数空格，特殊字符，a-z，A-Z，0-9）：

: 811,593
: 30,488
: 17,292|56.72%
: 8,533
: 4,663
: 5
: 2
: 2019-05-19 08:37
: 2019-05-19 03:40
: 0.0, 0.0, 0.0
: 461M/1G|37.6%
: ens4, tun0, tun1
: 8G/28G|28.57%
: Linux-5.0.0-1006-gcp-x86_64-with-Ubuntu-19.10-eoan

我怎样才能准确地计算并紧跟表情符号（作为1个变量）和所有其他字符（作为不同的变量）？

唯一不应该计数的字符是'\ n'

Answer 1

这是我写的一段代码，应该可以引导您朝正确的方向前进：

您进行了一条鸣叫，并总结了字符串中的所有表情符号。
接下来，您将创建一个过滤器ignored_chars，其中包含所有您不想算作其他字符的字符。
计算字符串中剩余的剩余数字。

from emoji import UNICODE_EMOJI

tweet = "SOME TEXT \n\n\n\nGOES HERE"

num_emoji = sum(tweet.count(emoji) for emoji in UNICODE_EMOJI)
ignored_chars = UNICODE_EMOJI.copy()
ignored_chars['\n'] = 0
num_other = sum(0 if char in ignored_chars else 1 for char in tweet)
print(num_emoji, num_other)

输出：

3 19

编辑：设置字典键而不是附加到字符串。

Answer 2

这是一个不需要任何外部库的解决方案，以防万一。

def is_emoji(c): return ord(c) > 0x2100
def is_newline(c): return c == '\n'

num_emoji = sum((is_emoji(c) and not is_newline(c)) for c in tweet)
num_normal = sum(not (is_emoji(c) or is_newline(c) for c in tweet)

选择0x2100几乎是任意的；英文文本中的“正常”字符不太可能会高于该字符，但所有表情符号都不会高于该字符。但是，如果您知道“普通”文本仅为ASCII，则可以将其替换为更明显的127。

如何获得像Twitter这样的字符数

2 个答案: