Question

我正在尝试解析CSV文件以输入SQL数据库，并且在尝试操作数据框以解决各种数据类型方面遇到了一些麻烦。

我的数据框有这样的列：

 Date, ID, DataLabel, Value

“值”列包含数字数据和文本数据。我基本上想要在名为Value_Num和Value_Text的数据框中创建2个新列。对于Value列中值为数字的值，我想将它们复制到新Value_Num列，并保留Value_Text null，对于那些我希望将它们复制到{ {1}}列留下Value_Text null。

然后我想删除旧的Value列。

Answer 1

如果我理解你的问题，这应该有效：

import pandas as pd

# create dummy df for this example
df=pd.DataFrame(['text','1234']*4, columns=['Value'])

df
Value
0  text
1  1234
2  text
3  1234
4  text
5  1234
6  text
7  1234

# convert the numbers first
df['Val_Num']=df.Value.convert_objects(convert_numeric=True)

# then use the null values in the Val_Num column to find the text values
df['Val_Text']=df.Value.ix[df.Val_Num.isnull()]

# delete the Value column
df.drop('Value', inplace=True, axis=1)

df
Val_Num Val_Text
0      NaN     text
1     1234      NaN
2      NaN     text
3     1234      NaN
4      NaN     text
5     1234      NaN
6      NaN     text
7     1234      NaN

Answer 2

使用apply逐行检查并测试值是否为字符串的数字是分离它们的最快方法。

import pandas as pd
import numpy as np

df = pd.DataFrame([['2015-01-01', 1, 'One', 1], ['2015-01-02', 2, 'Two', 'Two']], columns=['Date', 'ID', 'DataLabel', 'Value'])
df['Value_Num'] = df['Value'].apply(lambda x: x if np.isreal(x) else np.nan)
df['Value_Text'] = df['Value'].apply(lambda x: x if isinstance(x, str) else np.nan)
df = df.drop('Value', axis=1)

按数据类型过滤/更新Python数据帧

2 个答案: