Pandas将包含字符串的列与分号分隔为多列

时间:2017-08-01 17:26:21

标签: python pandas

我无法拆分包含分号的熊猫系列。是因为我使用列名('Social_Media')作为索引,还是因为python不会将分号识别为分裂字符?或者我的剧本出了什么问题?

Sub changeName()

  Dim wdPath As String
  Dim wdApp As Object
  Dim wdDoc As Object

  'Get the path to the Word document out of Excel
  wdPath = Worksheets("Data").Cells(2,3)

  Set wdApp = CreateObject(Word.Application)
  Set wdDoc = wdApp.Documents.Open(wdPath)


  'Doing some stuff with the bookmarks, not touching wdApp

  Call findAndReplace(wdApp)

End Sub


Sub findAndReplace(wdApp)

  'first try: runtime error 450
  With Selection.Find

  'second try: runtime error 91
  With wdApp.Selection.Find

    .ClearFormating
    .Text = "test"
    .Forward = True
    .Wrap = wdFindStop
    Do While .Execute
      Selection.Delete

      Selection.InsertCrossReference ReferenceType:="Textmarke", ReferenceKind:=wdContentText, _ 
      ReferenceItem:="1234", InsertAsHyperlink:=True, IncludePosition:=False, SeperateNumbers:=False, _
      SeperatorString:=" "
    Loop
  End With
End Sub

我需要看作输出。

#Filters the NaN columns
df2 = df[df['Social_Media'].notnull()]
# Splitter for semicolon
df2['Social_Media'].apply(lambda x: x.split(';')[0])

#This is my output after the split
Timestamp                             
2017-06-01 18:10:46          Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:24:04          Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:25:21          Twitter;Facebook;Instagram;WhatsApp;Google+

1 个答案:

答案 0 :(得分:1)

您可以使用str.split

df = df['Social_Media'].str.split(';', expand=True).add_prefix('name_')
print (df)
                      name_0    name_1     name_2    name_3   name_4
Timestamp                                                           
2017-06-01 18:10:46  Twitter  Facebook  Instagram  WhatsApp  Google+
2017-06-01 19:24:04  Twitter  Facebook  Instagram  WhatsApp  Google+
2017-06-01 19:25:21  Twitter  Facebook  Instagram  WhatsApp  Google+

对于按字母输入的列名:

import string
L = list(string.ascii_lowercase)
names = dict(zip(range(len(L)), ['name_' + x for x in  L]))

df = df['Social_Media'].str.split(';', expand=True).rename(columns=names)
print (df)
                      name_a    name_b     name_c    name_d   name_e
Timestamp                                                           
2017-06-01 18:10:46  Twitter  Facebook  Instagram  WhatsApp  Google+
2017-06-01 19:24:04  Twitter  Facebook  Instagram  WhatsApp  Google+
2017-06-01 19:25:21  Twitter  Facebook  Instagram  WhatsApp  Google+