Question

我有一列包含字符串。我想转换此列，所以最后只显示字符串的前n个字。

我知道我需要分割字符串，然后拼接列表以保留前n个字。然后，我可以使用join来重新加入它们。但是执行此操作时遇到麻烦。

我希望以下方法能起作用：

data = [[1, "A complete sentence must have, at minimum, three things: a subject, verb, and an object. The subject is typically a noun or a pronoun."], [2, "And, if there's a subject, there's bound to be a verb because all verbs need a "], [3, "subject. Finally, the object of a sentence is the thing that's being acted upon by the subject."], [4, "So, you might say, Claire walks her dog. In this complete "]] 
df = pd.DataFrame(data, columns = ['id', 'text']) 

df['first_three'] = df['text'].str.split()[:3]

但这将对前3行执行split命令，而不是保留每行的前3个字。

所以看起来像这样：

first_three
['A', 'complete', 'sentence', 'must', 'have,', 'at', 'minimum,', 'three', 'things:', 'a', 'subject,', 'verb,', 'and', 'an', 'object.', 'The', 'subject', 'is', 'typically', 'a', 'noun', 'or', 'a', 'pronoun.']
['And,', 'if', "there's", 'a', 'subject,', "there's", 'bound', 'to', 'be', 'a', 'verb', 'because', 'all', 'verbs', 'need', 'a']
['subject.', 'Finally,', 'the', 'object', 'of', 'a', 'sentence', 'is', 'the', 'thing', "that's", 'being', 'acted', 'upon', 'by', 'the', 'subject.']
NaN

我希望列first_three看起来像这样：

first_three
[A, complete, sentence]
[And, if, there's]
[subject, Finally, the]
[So, you, might]

所以我可以加入他们并继续。我知道这必须很容易解决，但是我似乎找不到解决方案。非常感谢您的输入。

Answer 1

您可以使用apply函数从列表中提取所需数量的元素。

df['first_three'] = df['text'].str.split().apply(lambda x : x[:3])

如果您还希望清除某些文本，则可以执行以下操作：

df['first_three'] = df['text'].str.replace(",", " ")
df['first_three'] = df['first_three'].apply(lambda x : x.split()[:3])

输出

first_three
[A, complete, sentence]
[And, if, there's]
[subject., Finally, the]

列出数据框列中每一行的接头

1 个答案: