我被建议从类结构,定义我自己的类,到pandas DataFrame领域,我想要对我的数据进行许多操作。
此时我有一个如下所示的数据框:
ID Name Recording Direction Duration Distance Path Raw
0 129 Houston Woodlands X 12.3 8 HWX.txt
1 129 Houston Woodlands Y 12.3 8 HWY.txt
2 129 Houston Woodlands Z 12.3 8 HWZ.txt
3 129 Houston Downtown X 11.8 10 HDX.txt
4 129 Houston Downtown Y 11.8 10 HDY.txt
5 129 Houston Downtown Z 11.8 10 HDZ.txt
... ... ... .. .. ... ... ...
2998 333 Chicago Downtown X 3.4 50 CDX.txt
2999 333 Chicago Downtown Y 3.4 50 CDY.txt
3000 333 Chicago Downtown Z 3.4 50 CDZ.txt
当时没关系,但是,我想在加载文件/数组(添加列)后对所有XYZ进行分组,除此之外,添加带有数组操作产品的新列(例如FFT)
最后,我想要一个看起来像这样的DataFrame:
ID Name Recording Duration Distance Rawx Rawy Raxz FFT-Rawx FFT-Rawy FFT-Raxz
0 129 Houston Woodlands 12.3 8 HWX.txt HWY.txt HWZ.txt FFT-HWX.txt FFT-HWY.txt FFT-HWZ.txt
1 129 Houston Downtown 11.8 10 HDX.txt HDY.txt HDZ.txt FFT-HDX.txt FFT-HDY.txt FFT-HDZ.txt
... ... ... .. ... ... ... ... ... ... ... ...
1000 333 Chicago Downtown 3.4 50 CDX.txt CDY.txt CDZ.txt FFT-CDX.txt FFT-CDY.txt FFT-CDZ.txt
知道怎么做?
不幸的是,并非所有细胞都具有这种漂亮的结构。
而不是
HDX HDY HDZ
我可以拥有“随机名称”。但是,我知道它们按此顺序排列:
首先是Z,第二个是Y,第三个是X。每条记录都有这三个信号,然后是下一条记录。
我正在思考以下几点:
k =1
for row in df:
if k % 3 == 0:
# Do something
elif k % 3 == 2:
# Do something
else:
# Do something
k += 1
但是,我不知道是否有一个选项可以将空列添加到已存在的数据帧并通过循环填充它。如果有这样的选择,请告诉我。
答案 0 :(得分:1)
我想我有一个部分答案!关于FFT(快速傅立叶变换?)以及数据来自哪里,我对你想要的东西感到有些困惑。
然而,我得到了其他一切。
首先,我要制作一些样本数据。
import pandas as pd
df = pd.DataFrame({"ID": [0, 1, 2, 3, 4, 5], "Name":[129, 129, 129, 129, 129, 129],
"Recording":['Houston Woodlands', 'Houston Woodlands', 'Houston Woodlands',
'Houston Downtown', 'Houston Downtown', 'Houston Downtown'],
"Direction": ["X", "Y", "Z", "X", "Y", "Z"], "Duration":[12.3, 12.3, 12.3, 11.8, 11.8, 11.8],
"Path_Raw":["HWX.txt", "HWY.txt", "HWZ.txt", 'HDX.txt', 'HDY.txt', 'HDZ.txt'],
"Distance": [8, 8, 8, 10, 10, 10]})
现在我将定义一些新功能。我把它们分开了,所以它们会更容易定制。基本上,我正在调用.unique并将每个Path Raw保存为一个新变量。
def splitunique0(group):
ulist = group.unique()
return(ulist[0])
def splitunique1(group):
ulist = group.unique()
return(ulist[1])
def splitunique2(group):
ulist = group.unique()
return(ulist[2])
dothis = {"Duration":"first", "Distance":"first", 'Path_Raw': {'Rawx': splitunique0,
'Rawy': splitunique1,
'Raxz': splitunique2}}
new = df.groupby(["Name", "Recording"]).agg(dothis)
new.columns = ["Duration", "Distance", "Raxz", "Rawx", "Rawy"]
这是完成的数据帧!
Duration Distance Raxz Rawx Rawy
Name Recording
129 Houston Downtown 11.8 10 HDZ.txt HDX.txt HDY.txt
Houston Woodlands 12.3 8 HWZ.txt HWX.txt HWY.txt
答案 1 :(得分:1)
考虑连接pandas.pivot_tables列表。但是,在连接之前,必须通过 Raw 值公共词干 - HW.txt , HD.txt , CD.txt - 使用正则表达式分组:
from io import StringIO
import pandas as pd
import re
df = pd.read_csv(StringIO('''
ID,Name,Recording,Direction,Duration,Distance,Path,Raw
0,129,Houston,Woodlands,X,12.3,8,HWX.txt
1,129,Houston,Woodlands,Y,12.3,8,HWY.txt
2,129,Houston,Woodlands,Z,12.3,8,HWZ.txt
3,129,Houston,Downtown,X,11.8,10,HDX.txt
4,129,Houston,Downtown,Y,11.8,10,HDY.txt
5,129,Houston,Downtown,Z,11.8,10,HDZ.txt
6,333,Chicago,Downtown,X,3.4,50,CDX.txt
7,333,Chicago,Downtown,Y,3.4,50,CDY.txt
8,333,Chicago,Downtown,Z,3.4,50,CDZ.txt'''))
# UNIQUE 'RAW' STEM GROUPINGS
grp = set([re.sub(r'X|Y|Z', '', i) for i in df['Raw'].tolist()])
dfList = []
for i in grp:
# FILTER FOR 'RAW' VALUES THAT CONTAIN STEMS
temp = df[df['Raw'].isin([i.replace('.txt', txt+'.txt') for txt in ['X','Y','Z']])]
# RUN PIVOT (LONG TO WIDE)
temp = temp.pivot_table(values='Raw',
index=['Name', 'Recording', 'Direction','Distance', 'Path'],
columns=['Duration'], aggfunc='min')
dfList.append(temp)
# CONCATENATE (STACK) DFS IN LIST
finaldf = pd.concat(dfList).reset_index()
# RENAME AND CREATE FFT COLUMNS
finaldf = finaldf.rename(columns={'X': 'Rawx', 'Y': 'Rawy', 'Z': 'Rawz'})
finaldf[['FFT-Rawx', 'FFT-Rawy', 'FFT-Rawz']] = 'FFT-' + finaldf[['Rawx', 'Rawy', 'Rawz']]
<强>输出强>
# Duration Name Recording Direction Distance Path Rawx Rawy Rawz FFT-Rawx FFT-Rawy FFT-Rawz
# 0 129 Houston Downtown 11.8 10 HDX.txt HDY.txt HDZ.txt FFT-HDX.txt FFT-HDY.txt FFT-HDZ.txt
# 1 129 Houston Woodlands 12.3 8 HWX.txt HWY.txt HWZ.txt FFT-HWX.txt FFT-HWY.txt FFT-HWZ.txt
# 2 333 Chicago Downtown 3.4 50 CDX.txt CDY.txt CDZ.txt FFT-CDX.txt FFT-CDY.txt FFT-CDZ.txt