我需要使用以下unicode字符串的单词count。使用str_word_count
:
$input = 'Hello, chào buổi sáng';
$count = str_word_count($input);
echo $count;
结果是
7
这是aparentley错误。
如何获得所需的结果(4)?
答案 0 :(得分:3)
$tags = 'Hello, chào buổi sáng';
$word = explode(' ', $tags);
echo count($word);
答案 1 :(得分:1)
这是一个快速而脏的基于正则表达式(使用Unicode)的字数统计功能:
function mb_count_words($string) {
preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches);
return count($matches[0]);
}
“单词”是指包含以下一项或多项内容的任何内容:
这意味着以下包含5个“单词”(4个正常,1个连字符):
echo mb_count_words('Hello, chào buổi sáng, chào-sáng');
现在,这个功能不适合非常大的文本;虽然它应该能够处理互联网上的大部分文本块。这是因为preg_match_all
需要构建并填充一个大数组,只有在计算之后将其丢弃(这是非常低效的)。一种更有效的计数方法是逐字逐句查找文本,识别unicode空格序列,并递增辅助变量。这不会那么困难,但这很乏味,需要时间。
答案 2 :(得分:0)
我正在使用此代码计算单词。你可以试试这个
$s = 'Hello, chào buổi sáng';
$s1 = array_map('trim', explode(' ', $s));
$s2 = array_filter($s1, function($value) { return $value !== ''; });
echo count($s2);
答案 3 :(得分:0)
您可以使用此函数来计算给定字符串中的unicode单词:
import pickle
# define Class and create instance
class Foo:
def __init__(self):
self.val = 1
foo = Foo()
# dump foo into file
with open("foo.pickle", 'wb') as handle:
pickle.dump(foo, handle, pickle.HIGHEST_PROTOCOL)
# Old class is kept with a new name
FooOld = Foo
# overwrite and add @property in the class definition
class Foo:
def __init__(self):
self._val = "new_foo"
@property
def val(self):
return self._val
@val.setter
def val(self, val):
self._val = val
foo_new = Foo()
print(foo_new.val)
# Custom Unpickler
class FooOldUnpickler(pickle.Unpickler):
def __init__(self, *args, **kwargs):
super(FooOldUnpickler, self).__init__(*args, **kwargs)
def load(self):
obj = super(FooOldUnpickler, self).load()
if type(obj) is FooOld:
# Object conversion logic
newObj = Foo()
newObj.val = obj.val
obj = newObj
return obj
def find_class(self, module, name):
# Use old class instead of new for loaded objects
if module == __name__ and name == 'Foo':
return FooOld
return super(FooOldUnpickler, self).find_class(module, name)
# reload foo
with open("foo.pickle", "rb") as handle:
# Use custom unpickler
foo_old = FooOldUnpickler(handle).load()
# try to access attributes
print(foo_old.val)
所有积分都转到author。