使用带有列的DataFrame作为str.format()的命名参数

时间:2016-08-24 19:30:36

标签: python pandas string-formatting

我有一个DataFrame,如:

import pandas as pd
df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"],
                   'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"],
                   'subject':["whaling","bullfighting","a messed-up family"]
                   })

我知道我可以这样做:

# produces desired output                   
("Some guy " + df['author'] + " wrote a book called " + 
   df['title'] + " that uses " + df['subject'] + 
   " as a metaphor for the human condition.")

但是可以使用str.format()更清楚地写出这一点,类似于:

# returns KeyError:'author'
["Some guy {author} wrote a book called {title} that uses "
   "{subject} as a metaphor for the human condition.".format(x) 
      for x in df.itertuples(index=False)]

2 个答案:

答案 0 :(得分:3)

>>> ["Some guy {author} wrote a book called {title} that uses "
   "{subject} as a metaphor for the human condition.".format(**x._asdict())
      for x in df.itertuples(index=False)]

['Some guy Melville wrote a book called Moby Dick that uses whaling as a metaphor for the human condition.', 'Some guy Hemingway wrote a book called The Sun Also Rises that uses bullfighting as a metaphor for the human condition.', 'Some guy Faulkner wrote a book called The Sound and the Fury that uses a messed-up family as a metaphor for the human condition.']

请注意,_asdict()并不构成公共API的一部分,因此在未来对pandas的更新中可能会依赖它。

你可以这样做:

>>> ["Some guy {} wrote a book called {} that uses "
   "{} as a metaphor for the human condition.".format(*x)
      for x in df.values]

答案 1 :(得分:0)

你也可以像这样使用DataFrame.iterrows()

["The book {title} by {author} uses "
   "{subject} as a metaphor for the human condition.".format(**x) 
     for i, x in df.iterrows()]

如果您想要:

,这很好
  • 使用命名参数,因此使用顺序不必与列的顺序匹配(如上所述)
  • 不使用_asdict()
  • 等内部功能

计时:最快似乎是M. Klugerford的第二个解决方案,即使我们注意到有关缓存的警告并采取最慢的运行。

# example
%%timeit
 ("Some guy " + df['author'] + " wrote a book called " + 
   df['title'] + " that uses " + df['subject'] + 
   " as a metaphor for the human condition.")
# 1000 loops, best of 3: 883 µs per loop

%%timeit
    ["Some guy {author} wrote a book called {title} that uses "
       "{subject} as a metaphor for the human condition.".format(**x._asdict())
          for x in df.itertuples(index=False)]
#1000 loops, best of 3: 962 µs per loop

%%timeit
    ["Some guy {} wrote a book called {} that uses "
     "{} as a metaphor for the human condition.".format(*x)
          for x in df.values]   
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 18.9 µs per loop

%%timeit
    from collections import OrderedDict
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for x in [OrderedDict(row) for i, row in df.iterrows()]]
#1000 loops, best of 3: 308 µs per loop            

%%timeit 
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for i, x in df.iterrows()]
#1000 loops, best of 3: 413 µs per loop         

为什么倒数第二个比最后一个更快超出了我。