我正在尝试遍历python 3中数据列的一列。我需要做的是从每一行中获取提及的国家和提及该国家的次数。 即如果我有这一行:
['[Aydemir, Deniz', ' Gunduz, Gokhan', ' Asik, Nejla] Bartin Univ, Fac Forestry, Dept Forest Ind Engn, TR-74100 Bartin, Turkey', ' [Wang, Alice] Lulea Univ Technol, Wood Technol, Skelleftea, Sweden']
它需要输出一个列表:['Turkey', 'Sweden']
如果我有这一行:
['[Fang, Qun', ' Cui, Hui-Wang] Zhejiang A&F Univ, Sch Engn, Linan 311300, Peoples R China', ' [Du, Guan-Ben] Southwest Forestry Univ, Kunming 650224, Yunnan, Peoples R China']
输出应为:['China', 'China']
。
我已经编写了这段代码,但是无法正常运行
from geotext import GeoText
sentence = df.iloc[0,0]
places = GeoText(sentence)
print(places.countries)
它只打印一次国家,在某些情况下,当它是美国时,它不会识别该缩写。你能帮我弄清楚怎么办吗?
l = [['[Aydemir, Deniz\', \' Gunduz, Gokhan\', \' Asik, Nejla] Bartin Univ, Fac Forestry, Dept Forest Ind Engn, TR-74100 Bartin, Turkey\', \' [Wang, Alice] Lulea Univ Technol, Wood Technol, Skelleftea, Sweden',1990],
['[Fang, Qun\', \' Cui, Hui-Wang] Zhejiang A&F Univ, Sch Engn, Linan 311300, Peoples R China\', \' [Du, Guan-Ben] Southwest Forestry Univ, Kunming 650224, Yunnan, Peoples R China',2005],
['[Blumentritt, Melanie\', \' Gardner, Douglas J.\', \' Shaler, Stephen M.] Univ Maine, Sch Resources, Orono, ME USA\', \' [Cole, Barbara J. W.] Univ Maine, Dept Chem, Orono, ME 04469 USA',2012]]
dataf = pd.DataFrame(l, columns = ['Authors', 'Year'])
我尝试执行此代码,但是我遇到了同样的问题,它并不能使所有县每行只给一个县
def find_country(n):
for c in pycountry.countries:
if str(c.name).lower() in n.lower():
return c.name
country1 = (dataf['Authors']
.replace(r"\bUSA\b", "United States", regex=True)
.apply(lambda x: find_country(x)))
答案 0 :(得分:1)
地理文字似乎无法正确检测到美国-值得尝试对该软件包提出问题。作为一种解决方法,我将“美国”替换为“美国”,这是可以正确检测到的。
df = (dataf['Authors']
.replace(r"\bUSA\b", "United States", regex=True)
.apply(lambda x: geotext.GeoText(x).countries)
)
我不确定您之前在做什么,但这将获得Author中每行的国家/地区列表,包括重复项。
0 [Turkey, Sweden]
1 [China, China]
2 [United States, United States]
Name: Authors, dtype: object
如评论中所述,如果您想获得列表的实际列表,只需在末尾添加tolist()
。
df.tolist()
[['Turkey', 'Sweden'], ['China', 'China'], ['United States', 'United States']]