我有一列包含字符串。我想转换此列,所以最后只显示字符串的前n个字。
我知道我需要分割字符串,然后拼接列表以保留前n个字。然后,我可以使用join来重新加入它们。但是执行此操作时遇到麻烦。
我希望以下方法能起作用:
data = [[1, "A complete sentence must have, at minimum, three things: a subject, verb, and an object. The subject is typically a noun or a pronoun."], [2, "And, if there's a subject, there's bound to be a verb because all verbs need a "], [3, "subject. Finally, the object of a sentence is the thing that's being acted upon by the subject."], [4, "So, you might say, Claire walks her dog. In this complete "]]
df = pd.DataFrame(data, columns = ['id', 'text'])
df['first_three'] = df['text'].str.split()[:3]
但这将对前3行执行split命令,而不是保留每行的前3个字。
所以看起来像这样:
first_three
['A', 'complete', 'sentence', 'must', 'have,', 'at', 'minimum,', 'three', 'things:', 'a', 'subject,', 'verb,', 'and', 'an', 'object.', 'The', 'subject', 'is', 'typically', 'a', 'noun', 'or', 'a', 'pronoun.']
['And,', 'if', "there's", 'a', 'subject,', "there's", 'bound', 'to', 'be', 'a', 'verb', 'because', 'all', 'verbs', 'need', 'a']
['subject.', 'Finally,', 'the', 'object', 'of', 'a', 'sentence', 'is', 'the', 'thing', "that's", 'being', 'acted', 'upon', 'by', 'the', 'subject.']
NaN
我希望列first_three看起来像这样:
first_three
[A, complete, sentence]
[And, if, there's]
[subject, Finally, the]
[So, you, might]
所以我可以加入他们并继续。 我知道这必须很容易解决,但是我似乎找不到解决方案。 非常感谢您的输入。
答案 0 :(得分:0)
您可以使用apply函数从列表中提取所需数量的元素。
df['first_three'] = df['text'].str.split().apply(lambda x : x[:3])
如果您还希望清除某些文本,则可以执行以下操作:
df['first_three'] = df['text'].str.replace(",", " ")
df['first_three'] = df['first_three'].apply(lambda x : x.split()[:3])
输出
first_three
[A, complete, sentence]
[And, if, there's]
[subject., Finally, the]