在csv列中的特殊字符之间提取字符串

时间:2017-06-08 19:34:23

标签: python regex csv pandas

我想从转推中提取用户句柄,即;任何用户名 “RT @username:xyzxyzxyz”到新列。我做了以下

df = pd.read_csv("string.csv")
for index,row in df.iterrows(): 
    df['Influencers'] = df['Tweet'].str.extract("\(@*?)\:")
df.to_csv('string3.csv', index=False)

它产生了以下错误:

  File "C:\ANACONDA\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unbalanced parenthesis

样本DF:

df=pd.DataFrame({"Tweet": ["RT @saikatd: Are editors involved in the transfer of Income Tax officials?","RT @CLManojET: Can't allow L-G's fantasy of running a parallel administration"," Fairplay n equity 2 consumers 2 be ensured"]})

2 个答案:

答案 0 :(得分:2)

试试这个:

df = pd.read_csv("string.csv")
df['Influencers'] = df['Tweet'].str.extract("RT\s+(\@[^\:]*)", expand=False)

<强>更新

In [34]: df
Out[34]:
                        Tweet
0      RT @username:xyzxyzxyz
1         Free text RT @user2
2                 Blah - blah
3  Text @another_user:aaaaaaa

In [35]: df['Influencers'] = df['Tweet'].str.extract("RT\s+(\@[^\:]*)", expand=False).fillna('Original')

In [36]: df
Out[36]:
                        Tweet Influencers
0      RT @username:xyzxyzxyz   @username
1         Free text RT @user2      @user2
2                 Blah - blah    Original
3  Text @another_user:aaaaaaa    Original

答案 1 :(得分:0)

很抱歉,我解决了这个问题,但我无法针对上述情况实施其他条件:

iframe

它为else条件生成空白行。