所以我试图按单元格编辑整个列单元格,将列从包含整数和字符串的内容更改为整数组件。例如
一个单元格如下:
3001234; textTEXT TextTeXTExt.TExt
我正在使用此命令:
df2.columns[3] = df2.columns[3].map(lambda x: x.lstrip([5:]))
我也尝试过这样的事情:
df2.columns[3] = df2.columns[3].split([])
这是我从ipython获得的错误:
AttributeError: 'unicode' object has no attribute 'map'
数据框中的实际列:
0 11212; xxxxxxxxxx xxxxxxxx
1 11212; xxxxxxxxxx xxxxxxxx
2 11212; xxxxxxxxxx xxxxxxxx
3 11212; xxxxxxxxxx xxxxxxxx
8 667788; xxxxxxx xxxxxxxxxxxxx xxxxxx
9 55555; xxxxxxx xxxxxxxxxxxxx xxxxxx
10 55555; xxxxxxx xxxxxxxxxxxxx xxxxxx
11 55555; xxxxxxx xxxxxxxxxxxxx xxxxxx
12 33333; xxxxxxx xxxxxxxxxxxxx xxxxxx
13 333; xxx xxxxx @ xxx xxx 2 xxxx
14 9991; xxxx; xxxxxx xxxxx xxxx @ 2 xxx
18 1635; vvvvvvvvvvvv vvvvvv 10
19 1635; vvvvvvvvvvvv vvvvvv 10
20 1635; vvvvvvvvvvvv vvvvvv 10
21 1635; vvvvvvvvvvvv vvvvvv 10
32 1712; Cxxxx xxxxxxxx; xxx 0
33 1712; Cxxxx xxxxxxxx; xxx 0
34 1712; Cxxxx xxxxxxxx; xxx 0
35 1712; Cxxxx xxxxxxxx; xxx 0
这是我正在运行的代码
import pandas as pd
# import excel file
xlsx = pd.ExcelFile("/home/PATH")
# create data frame from excel file on sheet 1
df2 = pd.read_excel(xlsx,'Sheet1')
df = pd.DataFrame({"Card": df2})
print(df.head())
df.iloc[:,0] = df.iloc[:,0].apply(lambda x: x.split(';')[0])
print df.head()
# delete columns not relative to us
df2.drop(df2.columns[[0,5,10,11]],inplace=True,axis=1)
答案 0 :(得分:0)
如果我理解你的问题,你可以试试这个:
import pandas as pd
import re
df = pd.DataFrame({'col1':['3001234; textTEXT TextTeXTExt.TExt', '1005678; more text']})
print(df)
col1
0 3001234; textTEXT TextTeXTExt.TExt
1 1005678; more text
digits = df['col1'].apply(lambda x: re.findall('\d+', str(x)))
print(digits)
0 [3001234]
1 [1005678]
Name: col1, dtype: object
df['col1'] = digits.str.get(0).astype(int)
print(df)
col1
0 3001234
1 1005678
print(df.dtypes)
col1 int32
dtype: object
答案 1 :(得分:0)
df2.columns [3]表示列名而不是列内容。列名没有map或apply等方法。使用df.iloc [:,column_number]或df ['column_name']来获取列的内容。
import pandas as pd
data = [u'11212; xxxxxxxxxx xxxxxxxx',
u'11212; xxxxxxxxxx xxxxxxxx',
u'11212; xxxxxxxxxx xxxxxxxx',
u'11212; xxxxxxxxxx xxxxxxxx',
u'667788; xxxxxxx xxxxxxxxxxxxx xxxxxx',
u'55555; xxxxxxx xxxxxxxxxxxxx xxxxxx',
u'55555; xxxxxxx xxxxxxxxxxxxx xxxxxx',
u'55555; xxxxxxx xxxxxxxxxxxxx xxxxxx',
u'33333; xxxxxxx xxxxxxxxxxxxx xxxxxx',
u'333; xxx xxxxx @ xxx xxx 2 xxxx',
u'9991; xxxx; xxxxxx xxxxx xxxx @ 2 xxx',
u'1635; vvvvvvvvvvvv vvvvvv 10',
u'1635; vvvvvvvvvvvv vvvvvv 10',
u'1635; vvvvvvvvvvvv vvvvvv 10',
u'1635; vvvvvvvvvvvv vvvvvv 10',
u'1712; Cxxxx xxxxxxxx; xxx 0',
u'1712; Cxxxx xxxxxxxx; xxx 0',
u'1712; Cxxxx xxxxxxxx; xxx 0',
u'1712; Cxxxx xxxxxxxx; xxx 0']
# make a dataframe from data as the first column
df = pd.DataFrame({'col0': data})
print df.head()
#Here I use the iloc to the get the contents of first column (0 th column), in your case, it will 3)
df.iloc[:,0] = df.iloc[:,0].apply(lambda x: x.split(';')[0])
# in your case it will be
#df.iloc[:,3] = df.iloc[:,3].apply(lambda x: x.split(';')[0])
print df.head()
结果
col0
0 11212; xxxxxxxxxx xxxxxxxx
1 11212; xxxxxxxxxx xxxxxxxx
2 11212; xxxxxxxxxx xxxxxxxx
3 11212; xxxxxxxxxx xxxxxxxx
4 667788; xxxxxxx xxxxxxxxxxxxx xxxxxx
col0
0 11212
1 11212
2 11212
3 11212
4 667788
答案 2 :(得分:0)
df["Col"] = df["Col"].str.extract('(\d+)')