我有一个文本文件,其格式如下所示。后跟[edit]的字符串是州名,后面跟着()是每个州的区域。我想创建一个Pandas DataFrame,它将状态放在另一列的一列和区域中。为了更清晰的数据,我还会删除[]或()之内和之后的内容。
['Alabama[edit]\n',
'Auburn (Auburn University)[1]\n',
'Florence (University of North Alabama)\n',
'Jacksonville (Jacksonville State University)[2]\n',
'Livingston (University of West Alabama)[2]\n',
'Montevallo (University of Montevallo)[2]\n',
'Troy (Troy University)[2]\n',
'Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]\n',
'Tuskegee (Tuskegee University)[5]\n',
'Alaska[edit]\n',
'Fairbanks (University of Alaska Fairbanks)[2]\n',
'Arizona[edit]\n',
'Flagstaff (Northern Arizona University)[6]\n',
'Tempe (Arizona State University)\n',
'Tucson (University of Arizona)\n',
这是我的代码:
with open('university_towns.txt') as f:
f=f.readlines()
for line in f:
lst = []
state = ''
region = ''
dict1={}
if line.endswith('[ed'):
state = line.split('[')[0]
else:
if line.find ('(') > 0:
region = line.split(' ')[0]
else:
region = line.split('\n')[0].strip()
dict1.update({'RegionName':region, 'State':state})
lst.append (dict1)
lst = pd.DataFrame(lst, columns=["State", "RegionName"])
我得到一个奇怪的数据框,有2列和1行,没有州名...我想知道我做错了什么以及如何解决?