如何在Dataframe中添加具有多个字符串值的新列

时间:2015-04-20 12:14:29

标签: python python-2.7 pandas

例如,我想添加一个名为DFA的新列,其中包含大量字符串值,例如http://...URL个链接。

基本上添加一个包含多个值的新列。

#!/usr/bin/python

import time
import os,sys
from os import path
import re
import sys, ast
import subprocess
import numpy as np
#from StringIO import StringIO
import pandas as pd
from IPython.display import HTML
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_colwidth',1000)
pd.set_option('display.width', 1000)
pd.set_option('display.notebook_repr_html', True)


location = "/root/madhu_test/bpstest/results/finalnss.txt"

f = pd.read_csv(location,delimiter='\t\t',skiprows=2)


cols = f.columns.tolist()
print cols
f = f.drop('BPS Profile.2', 1)
f = f.drop('BPS Profile.1', 1)
np.radians(f['Throughput'])
np.radians(f['Throughput.1'])
f.Throughput = f.Throughput.round()
f['Throughput.1'] = f['Throughput.1'].round()
f['percentage'] = ((f['Throughput.1']-f['Throughput'])/f['Throughput.1'])*100.0
f['percentage.1'] = ((f['Throughput.2']-f['Throughput'])/f['Throughput.2'])*100.0
f['Throughput.2'] = f['Throughput.2'].round()

f.percentage = f.percentage.round(1)
f['percentage.1'] = f['percentage.1'].round(1)
f['DFA'].loc = [['<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>','<a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>']]

提供输出

                                                                    DFA
0    SigTestHTTP21kBin         217           219           453         0.9          52.0  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
1   SigTestHTTP21kHtml         359           364           372         1.4           3.4  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
2   SigTestHTTP21kText         380           376           378        -1.1          -0.6  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
3     NSS-HTTP21Kdelay         378           380           378         0.5           0.0  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
4          NSS-HTTPCPS       18920            75            76    -25126.7      -24821.0  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
5    SIggTestPerimeter         270           223           232       -21.1         -16.2  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
6    SIgTestDatacenter         371           373           361         0.5          -2.7  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
7        NSS-Financial           5            56            57        91.1          91.2  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
8        NSS-Education         971          1010           958         3.9          -1.4  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
9       NSS-EuroMobile         921           933           942         1.3           2.2  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>]
10        NSS-USMobile         528           542           633         2.6          16.5  [<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>, <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt"

但我需要像这样的输出

                                                                                                                                             DFA
0    SigTestHTTP21kBin         217           219           453         0.9          52.0  <a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>
1   SigTestHTTP21kHtml         359           364           372         1.4           3.4   <a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>
2   SigTestHTTP21kText         380           376           378        -1.1          -0.6  
3     NSS-HTTP21Kdelay         378           380           378         0.5           0.0  
4          NSS-HTTPCPS       18920            75            76    -25126.7      -24821.0  
5    SIggTestPerimeter         270           223           232       -21.1         -16.2  
6    SIgTestDatacenter         371           373           361         0.5          -2.7  
7        NSS-Financial           5            56            57        91.1          91.2  
8        NSS-Education         971          1010           958         3.9          -1.4  
9       NSS-EuroMobile         921           933           942         1.3           2.2  
10        NSS-USMobile         528           542           633         2.6          16.5  

我试过

f['DFA'].loc[0:2] = ['<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>','<a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>']

我收到以下错误:

 File "./some.py", line 50, in <module>
    f['DFA'].loc[0:2] = ['<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>','<a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>']
  File "/usr/local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-i686.egg/pandas/core/frame.py", line 1780, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-i686.egg/pandas/core/frame.py", line 1787, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-i686.egg/pandas/core/generic.py", line 1068, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-i686.egg/pandas/core/internals.py", line 2849, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-i686.egg/pandas/core/index.py", line 1402, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3812)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3692)
  File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12299)
  File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12250)

1 个答案:

答案 0 :(得分:0)

如果要为多个行分配值列表,列表中的每个值都与特定行匹配,那么您需要lhs和rhs的长度匹配:

f['DFA'].iloc[0:2] = ['<a href="http://10.209.81.36/Binary_dfa_NSS_sorted_2464.txt">Binary</a>','<a href="http://10.209.81.36/HTML_dfa_NSS_sorted_2464.txt">Html</a>']

这使用基于整数行的索引

选择前两行