在以下结构的pandas Dataframe中:
mcve_data =
alfa alfa_id beta beta_id
a,c 7 c,de 8
c,d 7 d,f 9
l,mnk 8 c,d 9
j,k 8 d,e 9
tk,l 8 n,k 11
key
(alfa和beta)和key_index
(alfa_index,beta_index)中的值。key
中的值的长度超过3
,或者任何值的长度超过1个字符。我希望key-values
和key-index
都转换为句点.
。最终预期产出:
alfa alfa_id beta beta_id
a,c 7 . .
c,d 7 d,f 9
. . c,d 9
j,k 8 d,e 9
. . n,k 11
我想写一个类似的函数(但它没有正常工作):
def check_and_convert(mcve_data):
labels = (l, l + id) for l in mcve_data.columns.values
def convert(lines):
for l,id in labels:
if len(l) > 3:
l = '.'
id = '.'
return l, id
write this back to the file.
任何建议,
答案 0 :(得分:2)
您还可以使用str
访问器跳过内部for循环,以立即检查列中每个值的长度:
keys = [k for k in df.columns if not k.endswith('_id')]
for k in keys:
df.loc[df[k].str.len()>3,[k,k+'_id']] = '.'
答案 1 :(得分:1)
你可以使用for循环和iterrows()。见下文。
import pandas as pd
from StringIO import StringIO
s = """alfa alfa_id beta beta_id
a,c 7 c,de 8
c,d 7 d,f 9
l,mnk 8 c,d 9
j,k 8 d,e 9
tk,l 8 n,k 11
"""
df = pd.read_table(StringIO(s), delim_whitespace = True, dtype ={'alfa': str, 'alfa_id': str,
'beta': str, 'beta_id': str})
# I create a lsit of keys and key index based on '_id' distinction
keys = [i for i in df.columns if 'id' not in i]
key_ids = [i+'_id' for i in keys]
for index, row in df.iterrows():
for k,kid in zip(keys, key_ids):
if (len(row[k].split(','))>3 or any([len(i) > 1 for i in row[k].split(',')])):
df.set_value(index, kid, '.')
df.set_value(index, k, '.')
print df
结果
alfa alfa_id beta beta_id
0 a,c 7 . .
1 c,d 7 d,f 9
2 . . c,d 9
3 j,k 8 d,e 9
4 . . n,k 11