我的句子列表如下
pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow','A grape is red']
如果我定义了停用词列表,例如
stopwords=['This', 'is', 'an', 'The']
我是否有办法将其应用于整个列表,使得我的输出是
pylist=['apple','orange','pineapple is yellow','A grape is red']
PS:我尝试将apply
与定义为删除[removewords(x) for x in pylist]
之类的停用词的函数一起使用,但未成功(而且不确定这是否是最有效的方法)。
谢谢!
答案 0 :(得分:2)
我认为您的输出并不是您真正想要的。停用词“ is”仍然包含在内。
我的尝试如下:
pylist = ['This is an apple', 'This is an orange', 'The pineapple is yellow', 'A grape is red']
stopwords = ['This', 'is', 'an', 'The']
stopwords = set(w.lower() for w in stopwords)
def remove_words(s, stopwords):
s_split = s.split()
s_filtered = [w for w in s_split if not w.lower() in stopwords]
return " ".join(s_filtered)
result = [remove_words(x, stopwords) for x in pylist]
result
是
['apple', 'orange', 'pineapple yellow', 'A grape red']
为了进行合理的有效搜索(在一个集合中查找当然需要花费恒定的时间),我将停用词的小写形式存储在一个集合中。通常,删除停用词应该不区分大小写。
旁注:删除停用词通常很有帮助,甚至有必要。但是请注意,在某些情况下不建议删除停用词:https://towardsdatascience.com/why-you-should-avoid-removing-stopwords-aa7a353d2a52
更新:当您确实确定需要摆脱所有可能的停用词时,请确保您不要错过任何停用词-以yatu的建议为例:看看nltk 。尤其是如果明年,您可能会面临必须添加西班牙的paraparas de paradas,法国的mot d'arrêt和德国的Stopp-Wörter的问题。
答案 1 :(得分:1)
您可以使用嵌套列表推导,并将set
定义为O(1)
,以将查找复杂度降低到pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow',
'A grape is red']
stopwords = set(['This', 'is', 'an', 'The'])
[' '.join([w for w in s.split() if w not in stopwords]) for s in pylist]
# ['apple', 'orange', 'pineapple yellow', 'A grape red']
:
stopwords
但是请注意,对于更通用的方法,您可以使用nltk
的英语语料库中的from nltk.corpus import stopwords
stop_w = set(stopwords.words('english'))
[' '.join([w for w in s.split() if w.lower() not in stop_w]) for s in pylist]
# ['apple', 'orange', 'pineapple yellow', 'grape red']
:
private void configureFirebase(String projectID, String applicationID, String APIkey, String databaseURL, String storageBucket) {
FirebaseOptions options = new FirebaseOptions.Builder()
.setProjectId(projectID)
.setApplicationId(applicationID)
.setApiKey(APIkey)
.setDatabaseUrl(databaseURL)
.setStorageBucket("gs://myProjectID.appspot.com")
.build();
try {
FirebaseApp.initializeApp(context, options, "secondary");
} catch (Exception e) {
Log.d("Exception",e.toString());
}
FirebaseApp secondary = FirebaseApp.getInstance("secondary");
FirebaseDatabase otherDatabase = FirebaseDatabase.getInstance(secondary);
databaseReference = otherDatabase.getReference();
FirebaseStorage storage = FirebaseStorage.getInstance("secondary");
storageRef = storage.getReferenceFromUrl(storageBucket);
saveBucketUrl(storageBucket);
}