
时间:2016-06-16 12:31:25

标签: python string pandas search dataframe




    data = {
        'opinion': ['He said it was too expensive',
                      'She said it was too costly',
                      'He thought it was not fast enough',
                      'They thought is was not right and too much money',
                      'Her view was that it was too small and too slow', 

df = pd.DataFrame(data, columns = ['opinion'])


0   He said it was too expensive
1   She said it was too costly
2   He thought it was not fast enough
3   They thought is was not right and too much money
4   Her view was that it was too small and too slow



for row in df['opinion']:
    if 'too expensive' in row or 'too costly' in row or 'too much money' in row:
        new_col.append('Too Expensive')
    elif 'not fast enough' in row or 'too slow' in row:
        new_col.append('Too Slow')

df['reason'] = new_col

    opinion                                           reason
0   He said it was too expensive                      Too Expensive
1   She said it was too costly                        Too Expensive
2   He thought it was not fast enough                 Too Slow
3   They thought is was not right and too much money  Too Expensive
4   Her view was that it was too small and too slow   Too Slow


4 个答案:

答案 0 :(得分:2)

您可以将list dictionaries保留在keys replacement valueslists包含to_replace单词words = [{'Too Expensive': ['too expensive', 'too costly', 'too much money'], 'Too Slow': ['not fast enough', 'too slow']}]


然后words超过str.containsregex使用to_replace一次查看所有.loc[]for word in words: for replacement, to_replace in word.items(): df.loc[df.opinion.str.contains('|'.join(to_replace)), 'reason'] = replacement 识别和 opinion reason 0 He said it was too expensive Too Expensive 1 She said it was too costly Too Expensive 2 He thought it was not fast enough Too Slow 3 They thought is was not right and too much money Too Expensive 4 Her view was that it was too small and too slow Too Slow 分配。

print_r(array_slice($table_list, $filter, NULL, TRUE));


(define (read-line . port)
  (define (eat p c)
    (if (and (not (eof-object? (peek-char p)))
             (char=? (peek-char p) c))
        (read-char p)))
  (let ((p (if (null? port) (current-input-port) (car port))))
    (let loop ((c (read-char p)) (line '()))
      (cond ((eof-object? c) (if (null? line) c (list->string (reverse line))))
            ((char=? #\newline c) (eat p #\return) (list->string (reverse line)))
            ((char=? #\return c) (eat p #\newline) (list->string (reverse line)))
            (else (loop (read-char p) (cons c line)))))))

答案 1 :(得分:1)


test_strings = ['too expensive', 'too costly', 'too much money']
for row in df['opinion']:
    for tester in test_strings:
        if tester in row:
            new_col.append("Too Expensive")

答案 2 :(得分:0)


df['reason'] = ''

df.ix[df.opinion.str.lower().str.contains(r'too\s+(?:expensive|costly|much money)'), 'reason'] = 'Too Expensive'

df.ix[df.opinion.str.lower().str.contains(r'(?:not fast enough|too slow)'), 'reason'] = 'Too Slow'

In [309]: df
                                            opinion         reason
0                      He said it was too expensive  Too Expensive
1                        She said it was too costly  Too Expensive
2                 He thought it was not fast enough       Too Slow
3  They thought is was not right and too much money  Too Expensive
4   Her view was that it was too small and too slow       Too Slow

答案 3 :(得分:0)

Pandas有一个快速的解决方案,可以将函数应用于行,所以.apply就是为此而设计的。理想情况下,矢量化是最快的,但我想不出这样做的方法。 .apply就在那之后,迭代行是最慢的,所以最好尽可能避免它。


def categorizer(x):
main_dict = {"too much money":"too expensive", "too expensive":"too expensive", "too costly":"too expensive", "too slow":"too slow", "not fast enough": "not fast enough"}
for key in main_dict:
    if key in x:
        return main_dict[key]
df["Category"] = df["opinion"].apply(lambda x:categorizer(x))