Question

假设我有一个像这样的pandas数据框：

         Word      Rating
   0     Bear      1
   1     Yuck      2
   2     Girl      3
   3     Yellow    4

如何在pandas中使用正则表达式来过滤掉以字母“y”开头的单词但保留数据框格式的行？我知道正则表达式模式是r“\ b [^ y] \ w + \ b”

预期产出：

         Word    Rating
    0    Bear    1
    2    Girl    3

Answer 1

使用startswith

In [1187]: df[~df.Word.str.startswith('Y')]
Out[1187]:
   Word  Rating
0  Bear       1
2  Girl       3

或者，正则表达式match

In [1203]: df[df.Word.str.match('^[^Y]')]
Out[1203]:
   Word  Rating
0  Bear       1
2  Girl       3

Answer 2

不需要正则表达式。只需查看第一个字母：

x = 'path\\to\\file'
temp_dict = dict()
chat_dict = dict()
with open(x, 'r') as f:
    for line in f:
        splitLine = line.split()
        temp_dict[(splitLine[0])] = " ".join(splitLine[2:])
        # Dirty hack to remove timestamp
        temp_array = temp_dict.values()
        chat_dict.update(dict(s.split(':')[:2] for s in temp_array))
print(chat_dict)

Answer 3

使用lower和startswith同时获得大写＆＃39; Y＆＃39;和小写＆＃39; y＆＃39;：

df[~df.Word.str.lower().str.startswith('y')]

输入：

df

     Word  Rating
0    Bear       1
1    Yuck       2
2    Girl       3
3  Yellow       4
4     yes       5
5   color       6

输出：

    Word  Rating
0   Bear       1
2   Girl       3
5  color       6

使用正则表达式从pandas数据帧中删除行

3 个答案: