我有一个dataframe while answer.lower() != "proceed" and answer.lower() != "return":
await ctx.send("Only enter 'proceed' or 'return'!")
await ctx.send('''Are you sure you want to nuke this channel? This will completely erase all messages from it!
type proceed to continue, and return to return. ''')
answer = await client.wait_for('message', check=lambda
message: message.author == ctx.author and message != "") # Gets user input and checks if message is not empty and was sent by the same user
answer = answer.content
我想从India 3741
United States 2455
New Delhi, India 1721
Mumbai, India 1401
Washington, DC 1354
SpaceCoast,Florida 1
stuck in a book. 1
Beirut , Lebanon 1
Royston Vasey - Tralfamadore 1
Langham, Colchester 1
Name: user_location, Length: 26920, dtype: int64
答案 0 :(得分:1)
# Get states of India and USA
in_url = 'https://en.wikipedia.org/wiki/States_and_union_territories_of_India#States_and_Union_territories'
in_states = pd.read_html(in_url)[3].iloc[:, 0].tolist()
us_url = 'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States'
us_states = pd.read_html(us_url)[0].iloc[:, 0].tolist()
states = in_states + us_states
# Make a sample dataframe
df = pd.DataFrame({'Country': states})
0 Andhra Pradesh
1 Arunachal Pradesh
2 Assam
3 Bihar
4 Chhattisgarh
... ...
73 Virginia[E]
74 Washington
75 West Virginia
76 Wisconsin
77 Wyoming
# Map state names to country name
states_dict = {state: 'India' for state in in_states}
states_dict.update({state: 'USA' for state in us_states})
df['Country'] = df['Country'].map(states_dict)
0 India
1 India
2 India
3 India
4 India
... ...
73 USA
74 USA
75 USA
76 USA
77 USA
答案 1 :(得分:0)
# Trying to get all the locations of USA and India
import pandas as pd
us_url = 'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States'
us_states = pd.read_html(us_url)[0].iloc[:, 0].tolist()
us_cities = pd.read_html(us_url)[0].iloc[:, 1].tolist() + pd.read_html(us_url)[0].iloc[:, 2].tolist() + pd.read_html(us_url)[0].iloc[:, 3].tolist()
us_Federal_district = pd.read_html(us_url)[1].iloc[:, 0].tolist()
us_Inhabited_territories = pd.read_html(us_url)[2].iloc[:, 0].tolist()
us_Uninhabited_territories = pd.read_html(us_url)[3].iloc[:, 0].tolist()
us_Disputed_territories = pd.read_html(us_url)[4].iloc[:, 0].tolist()
us = us_states + us_cities + us_Federal_district + us_Inhabited_territories + us_Uninhabited_territories + us_Disputed_territories
in_url = 'https://en.wikipedia.org/wiki/States_and_union_territories_of_India#States_and_Union_territories'
in_states = pd.read_html(in_url)[3].iloc[:, 0].tolist() + pd.read_html(in_url)[3].iloc[:, 4].tolist() + pd.read_html(in_url)[3].iloc[:, 5].tolist()
in_unions = pd.read_html(in_url)[4].iloc[:, 0].tolist()
ind = in_states + in_unions
usToStr = ' '.join([str(elem) for elem in us])
indToStr = ' '.join([str(elem) for elem in ind])
# Country name checker function
def checkl(T):
TSplit_space = [x.lower().strip() for x in T.split()]
TSplit_comma = [x.lower().strip() for x in T.split(',')]
TSplit = list(set().union(TSplit_space, TSplit_comma))
res_ind = [ele for ele in ind if(ele in T)]
res_us = [ele for ele in us if(ele in T)]
if 'india' in TSplit or 'hindustan' in TSplit or 'bharat' in TSplit or T.lower() in indToStr.lower() or bool(res_ind) == True :
T = 'India'
elif 'US' in T or 'USA' in T or 'United States' in T or 'usa' in TSplit or 'united state' in TSplit or T.lower() in usToStr.lower() or bool(res_us) == True:
T = 'USA'
elif len(T.split(','))>1 :
if T.split(',')[0] in indToStr or T.split(',')[1] in indToStr :
T = 'India'
elif T.split(',')[0] in usToStr or T.split(',')[1] in usToStr :
T = 'USA'
T = "Others"
T = "Others"
return T
# Appling the function on the dataframe column
Others 74206
USA 47840
India 20291
Name: user_location, dtype: int64
我在python编码方面还很陌生。我认为这段代码可以用更好,更紧凑的形式编写。就像在前面的答案中提到的那样,仍然有很多边缘情况需要处理。因此,我也将其添加到了 Code Review Stack Exchange上。对于提高我的代码的效率和可读性的任何批评和建议,将不胜感激。