我正在尝试编写一个函数,该函数计算数组中每个项目的小写字母和大写字母的数量。然后在原始表格中创建一个新列,以将该实例分类为具有更多的大写或小写字母。
这是我的数组:
[array([['MARZIA HAS LIGMA LWIAY #0044'],
['This Slinky Montage Is Bizarrely Satisfying to Watch'],
['MAKING HER DREAM COME TRUE! (MAKE A WISH)'],
...,
['THE EVOLUTION OF FORTNITE! 2011 - 2019'],
["India's trucks are works of art"],
['Several airlines change flight routes after Iranian missile downs American drone']],
dtype=object)]
Upper=0
Lower=0
for c in range(len(dummy_array)):
for i in c:
if i.isupper():
Upper +=1
elif i.islower():
Lower +=1
else:
pass
if d[Upper] > d[Lower]:
train_data["Caps_in_title"] = 1
else:
train_data["Caps_in_title"] = 0
print(Upper)
print(Lower)
答案 0 :(得分:0)
这应该有效:
import pandas as pd
# setup
df = pd.DataFrame({
"text" : [['MARZIA HAS LIGMA LWIAY #0044'],
['This Slinky Montage Is Bizarrely Satisfying to Watch'],
['MAKING HER DREAM COME TRUE! (MAKE A WISH)'],
['THE EVOLUTION OF FORTNITE! 2011 - 2019'],
["India's trucks are works of art"],
['Several airlines change flight routes after Iranian missile downs American drone']]
})
def character_case_ratio(xs):
"""
function to count number of upper and lowercase letters and compare there ratio
"""
upper_count = len([x for x in xs[0] if (x.isupper() and x.isalpha())])
lower_count = len([x for x in xs[0] if (x.islower() and x.isalpha())])
if upper_count > lower_count:
return 1
return 0
# apply the function above over your input text, and create a new column in the DataFrame
df["case_ratio"] = df["text"].apply(character_case_ratio)
输出:
text case_ratio
0 [MARZIA HAS LIGMA LWIAY #0044] 1
1 [This Slinky Montage Is Bizarrely Satisfying t... 0
2 [MAKING HER DREAM COME TRUE! (MAKE A WISH)] 1
3 [THE EVOLUTION OF FORTNITE! 2011 - 2019] 1
4 [India's trucks are works of art] 0
5 [Several airlines change flight routes after I... 0