任务是创建一个单一的功能
E.g。给出了文字输入,我试过了:
pd.DataFrame.apply
[OUT]:
缺货[8]:
from multiprocessing import Pool
import numpy as np
import pandas as pd
from nltk import word_tokenize
def apply_me(df_this):
return df_this[0].astype(str).apply(word_tokenize)
def parallelize_apply(df, apply_me, num_partitions):
df_split = np.array_split(df, num_partitions)
pool = Pool(num_partitions)
df = pd.concat(pool.map(apply_me, df_split))
pool.close()
pool.join
return df
text = """Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel"."""
df = pd.DataFrame(text.split('\n'))
parallelize_apply(df, apply_me, 2)
我的问题是我是否有办法将0 [Let, 's, try, something, .]
1 [I, have, to, go, to, sleep, .]
2 [Today, is, June, 18th, and, it, is, Muiriel, ...
3 [Muiriel, is, 20, now, .]
4 [The, password, is, ``, Muiriel, '', .]
Name: 0, dtype: object
函数放入word_tokenize()
而不是将hard_code放入parallize_apply()
函数?
简化功能,我尝试过:
apply_me()
它实现了相同的输出,但仍然需要让from multiprocessing import Pool
import numpy as np
import pandas as pd
from nltk import word_tokenize
def apply_me(df_this, func):
return df_this[0].astype(str).apply(func)
def parallelize_apply(df, func, num_partitions):
df_split = np.array_split(df, num_partitions)
with Pool(num_partitions) as pool:
_apply_me = partial(apply_me, func=func)
df = pd.concat(pool.map(_apply_me, df_split))
pool.join
return df.tolist()
text = """Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel"."""
df = pd.DataFrame(text.split('\n'))
parallelize_apply(df, word_tokenize, 6)
的中间函数apply_me
按需运行。有没有办法删除它?