我有一只看起来像这样的熊猫df:
id text
10000 Hi, how are you? [10000] Good thanks, yourself? [10000] I'm great.
20000 Is it hot there today? [20000] No, it's raining. [2000] Oh, too bad!
30000 What's your name [3000] Steve, and yours? [3000] Rita.
这是df:
df = pd.DataFrame([
[1000, "Hi, how are you? [10000] Good thanks, yourself? [10000] I'm great."],
[2000, "Is it hot there today? [20000] No, it's raining. [2000] Oh, too bad!"],
[3000, "What's your name [3000] Steve, and yours? [3000] Rita."]], columns=['id', 'text'])
我想添加一个新列,根据“id”列中的值将“text”列拆分为一个列表。
id text lines
10000 "Hi, how are you? [10000] Good thanks, yourself? ["Hi, how are you?",
[10000] I'm great." "Good thanks, ...",
"I'm great."]
20000 Is it hot there today? [20000] No, it's raining. ["Is it hot there ...",
[2000] Oh, too bad! "No, it's raining.",
"Oh, too bad!"]
30000 What's your name? [3000] Steve, and yours? ["What's your name?",
[3000] Rita. "Steve, and yours?",
"Rita."]
我试过了:
df ['lines'] = df.apply(lambda x: x['text'].split(x['id']))
但是我得到了一个KeyError:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc
(pandas/index.c:4279)()
pandas/src/hashtable_class_helper.pxi in
pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8543)()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last)
<ipython-input-14-e50f764c5674> in <module>()
----> 1 df ['lines'] = df.apply(lambda x: x['text'].split(x['id']))
KeyError: ('text', 'occurred at index id')
答案 0 :(得分:1)
使用axis=1
和适当的分隔符。
In [548]: df.apply(lambda x: x['text'].split(' [%s] ' % x['id']), axis=1)
Out[548]:
0 [Hi, how are you?, Good thanks, yourself?, I'm...
1 [Is it hot there today?, No, it's raining., Oh...
2 [What's your name, Steve, and yours?, Rita.]
dtype: object