我有一个主题#s这样的文本文件(名为subjects_visit1.txt):
577
610
650
770
883
以及如下所示的Excel工作表:
Low Connectivity Subject CanEstimate Entropy High Connectivity Subject CanEstimate Entropy
0.0816764 870 TRUE 0.308933317 -0.0313064 668 TRUE 0.868862941
0.215038 577 TRUE 0.448918189 0.172506 600 TRUE 0.885315753
0.0596745 651 TRUE 0.47695019 0.268619 595 TRUE 0.896439952
0.267082 817 TRUE 0.500621849 -0.0507346 851 TRUE 0.907089718
0.18407 567 TRUE 0.508109648 0.182189 822 TRUE 0.915782923
0.0326328 731 TRUE 0.517241379 0.201325 623 TRUE 0.929279958
0.237822 625 TRUE 0.518493071 0.511613 622 TRUE 0.953520938
0.246291 913 TRUE 0.548079129 0.101731 850 TRUE 0.956564212
0.182494 619 TRUE 0.554281617 0.0195069 823 TRUE 0.958840854
-0.0321676 610 TRUE 0.55939053 0.0610047 632 TRUE 0.960237986
0.198884 655 TRUE 0.581442494 0.155816 770 TRUE 0.973656398
0.029618 631 TRUE 0.620796248 0.0703396 754 TRUE 1.012278949
0.205221 866 TRUE 0.630981714 0.19077 804 TRUE 1.023361826
-0.00397881 842 TRUE 0.658492788 0.115125 830 TRUE 1.033213695
0.193168 880 TRUE 0.665481783 -0.0440176 621 TRUE 1.035325469
0.0187139 838 TRUE 0.670966904 0.231593 603 TRUE 1.087118914
-0.0483586 829 TRUE 0.678253186 0.720004 732 TRUE 1.229303773
0.238947 634 TRUE 0.715214736 0.219465 746 TRUE 1.355378243
我想创建一个新的df,只包含我加载的文本文件中主题的数据,但是下面的代码还没有工作。我的主题列表的数据类型有问题吗?还是别的什么?
import pandas as pd
# load text file of subject #s
subject_list = open('subjects_visit2.txt', 'r')
lines = subject_list.read().split('\n')
subjs = list(lines)
newfile = pd.ExcelFile('amygdala_mPFC_data_pandas.xlsx')
df_ROI1 = newfile.parse("01")
# restrict to subject #s in text file
print df_ROI1['Subject'].isin(subjs)
df_ROI1 = df_ROI1[df_ROI1['Subject'].isin(subjs)]
答案 0 :(得分:2)
您可以使用以下内容:
In [5]: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
In [6]: df
Out[6]:
A B
0 5 1
1 6 2
2 3 3
3 4 5
In [7]: df[df['A'].isin([3, 6])]
Out[7]:
A B
1 6 2
2 3 3
顺便说一句,如果你使用Notebook环境,最好使用:
df.head(n=5) # Gives you the first 5 rows of the dataframe
df.sample(n=5) # Gives you a random set of 5 rows of the dataframe
Edit1:如果执行以下操作会发生什么:
values_list = df_ROI1['Subject'].unique()
if "577" in values_list:
print ("577 is in the dataframe and is a string")
elif 577 in values_list:
print ("577 is in the dataframe and is an integer")
else:
print ("577 is NOT in the dataframe")
EDIT2:
所以你犯的错误就是给算法一个字符串而不是一个整数。
请尝试:
df_ROI1 = df_ROI1[df_ROI1['Subject'].isin([577])] # Without the quotes around 577
df_ROI1.head(n=5)
答案 1 :(得分:1)
您可能需要pip install xlrd
才能使用.xlsx文件
否则,请将您的数据保存到.csv并使用pd.read_csv()
此外,您发布的数据似乎有8列,但我认为它只有4列,对吧? 如果没有,则存在需要解决的重复变量名称的问题。
import pandas as pd
with open('subjects_visit2.txt', 'r') as infile:
# put contents into a list without the newlines
subject_list = infile.read().splitlines()
# convert subject_list to a list of integers
subject_list = [int(subject) for subject in subject_list]
# open data file and show 1st 5 rows
df = pd.read_excel('amygdala_mPFC_data_pandas.xlsx')
print(df.head())
# uses .query() which allows easy to read syntax.
# Note: The @ symbol allows access to objects not defined in the data frame
new_df = df.query('Subject in @subject_list')
print(new_df)
输出将如下所示:
Connectivity Subject CanEstimate Entropy
0 0.081676 870 True 0.308933
1 0.215038 577 True 0.448918
2 0.059674 651 True 0.476950
3 0.267082 817 True 0.500622
4 0.184070 567 True 0.508110
Connectivity Subject CanEstimate Entropy
1 0.215038 577 True 0.448918
9 -0.032168 610 True 0.559391
28 0.155816 770 True 0.973656