pd.DataFrame({"Hashtags": [ "[]", "[u'AAPHealthCare4All']", "[u'CBI',","u'Delhi',", "u'Emergency']"]})
pd.DataFrame({"Hashtags": [ " ", "AAPHealthCare4All", "CBI","Delhi", "Emergency"]})
没有括号,括号或逗号和引号丢失/拼写错误。 []应该用空格代替。基本上我想删除所有“[”,“]”,“[u”“等。 我使用了以下代码,但无济于事:
for index,row in df.iterrows():
if "RT @" in row["Tweet"]:
df['Hashtags'] =df['Hashtags'].str.replace(r'[^[]]*\[|\][^]*|\[u\'*\'\]|\[\'*\'\]', '')
df.to_csv('string_HT.csv', index=False)
答案 0 :(得分:3)
您可以将以下表达式应用于您的主题标签:
df['Hashtags'] = sum([x if x else [" "] for x
in ast.literal_eval(''.join(df['Hashtags'])\
.replace('][', '],['))],\
[])
结果:
[' ', 'AAPHealthCare4All', 'CBI', 'Delhi', 'Emergency']
但是,数据框中的行数将更改,并且不会保留索引。您可能错误地使用了数据框。
答案 1 :(得分:1)
您可以使用提取功能:
Jupyter notebook
答案 2 :(得分:1)
df['Hashtags'] = df['Hashtags'].str.strip("[u,]").str.strip("'").replace('', ' ')
print (df['Hashtags'].tolist())
[' ', 'AAPHealthCare4All', 'CBI', 'Delhi', 'Emergency']
Double strip
是必要的,因为如果只有一个,它会从字符串的开头和结尾删除所有u
:
df = pd.DataFrame({"Hashtags": [ "[]", "[u'uuAAPHealthCare4All']",
"[u'uCBIuu',","u'uDelhi',", "u'Emergency']"]})
print (df)
Hashtags
0 []
1 [u'uuAAPHealthCare4All']
2 [u'uCBIuu',
3 u'uDelhi',
4 u'Emergency']
df['Hashtags'] = df['Hashtags'].str.strip("[u,']")
print (df['Hashtags'])
0
1 AAPHealthCare4All
2 CBI
3 Delhi
4 Emergency
Name: Hashtags, dtype: object