我是编码和python的新手,并认为我咬的更多然后我可以咀嚼但我正在尝试创建程序,读取包含3列信息的txt文件,然后获取这些列并列出它们。 然后我想创建一个条件,它将第3列行值与上下自身的行进行比较,如果值的差值大于5,则它将复制第1列和第2行,其中第3列中找到该值,将它附加到一个名为spikes的新列表中,我希望可以创建一个新的单独的txt文件。 我有一个名为“xyz_test.txt”的txt文件值的例子:
98015.985 -4922343.462 101.098
98015.985 -4922343.712 101.098
98015.985 -4922343.962 101.093
98015.985 -4922344.212 101.089
98015.985 -4922344.462 108.09
98015.985 -4922344.712 101.095
98015.985 -4922344.962 101.093
98015.985 -4922345.212 101.083
98015.985 -4922345.462 101.081
到目前为止,我能得到并弄清楚的是:
import csv,math listxy = [] listz = [] spikes = [] files =
list(csv.reader(open('xyz_test.txt', 'rb'), delimiter='\t'))
for z in files:
listxy = z[0],z[1]
listz = z[2]
print listz
我得到的结果如下:
101.098
101.098
101.093
101.089
108.09
101.095
101.093
101.083
101.081
现在我试图运行一个条件,首先发现列表中的一个数字差异高于5,高于和高于它的数字,但不断出现以下错误: “并非在字符串格式化期间转换所有参数” “无法连接'str'和'int'对象”
任何人都可以帮我解决这个问题。
感谢所有人的帮助,学到了分配。我改变了 适合我需要的代码,这是我最终得到的。仍然 调整,必须创建一些分类值和循环 通过几个txt文件,但到目前为止:
from __future__ import print_function
import pandas as pd
# sets dipslay to larger extent
#pd.set_option('display.height', 10000000)
#pd.set_option('display.max_rows', 5000000)
#pd.set_option('display.max_columns', 50)
#pd.set_option('display.width', 10000)
limit = 3
tries = 0
while True:
print ("----------------------------------------------------")
spikewell = float(raw_input("Please Enter Parameters: "))
tries += 1
if tries == 4:
print ("----------------------------------------------------")
print ("Entered incorrectly to many times.....Exiting")
print ("----------------------------------------------------")
break
else:
if spikewell > 50:
print ("parameters past limit (20)")
print ("----------------------------------------------------")
print (tries)
continue
elif spikewell < 0:
print ("Parameters cant be negative")
print ("----------------------------------------------------")
print (tries)
continue
else:
spikewell
print ("Parameters are set")
print (spikewell)
print ("Searching files")
print ("----------------------------------------------------")
terrain = "1_tile_test.txt"
for df in terrain:
df = pd.read_csv('1_tile_test.txt', sep=r'\s+', names=['____x____ ','____y____ ','____z____'])
# print orginal data frame (for testing)
# get spikes's coordinates
# df['col3'].shift(1) - previous value of the 'col3' column
# df['col3'].shift(-1) - next value of the 'col3' column
spikes = df.loc[(df['____z____'] - df['____z____'].shift(1) > spikewell) & \
(df['____z____'] - df['____z____'].shift(-1) > spikewell)]
wells = df.loc[-((df['____z____'] - df['____z____'].shift(1) > spikewell)) & \
-((df['____z____'] - df['____z____'].shift(-1)) > -spikewell)]
# print and save spikes
# print(spikes[['col1', 'col2','col3']])
# print(spikes2[['col1', 'col2','col3']])
# print(wells[['col1', 'col2','col3']])
# print(wells2[['col1', 'col2','col3']])
spikes[['____x____ ','____y____ ','____z____']].to_csv('spikes.txt', sep='\t', index=False)
#spikes2[['____x____ ','____y____ ','____z____']].to_csv('spikes.txt', sep='\t', index=False)
wells[['____x____ ','____y____ ','____z____']].to_csv('wells.txt', sep='\t', index=False)
#wells2[['____x____ ','____y____ ','____z____']].to_csv('wells.txt', sep='\t', index=False)
print ("----------------------------------------------------")
print ('Search completed')
break
break
答案 0 :(得分:1)
以下是一个例子:
import csv
def is_spike(three):
first, second, third = three
return abs(float(first[2]) - float(second[2])) > 5 and abs(float(second[2]) - float(third[2])) > 5
with open("yourcsvfile.csv") as csvfile:
reader = csv.reader(csvfile)
rows = list(reader)
threes = zip(rows, rows[1:], rows[2:])
spikes = [three for three in threes if is_spike(three)]
print spikes
输出(中间行是“尖峰”):
[(['98015.985',' - 4922344.212','101.089'],['98015.985',' - 4922344.462','108.09'],['98015.985',' - 4922344.712','101.095']) ]
操作实例:
首先,我们使用为我们分割它们的csv模块读取整个行数据。确保正确设置分隔符。您也可以手动阅读它们,但这更通用。
其次,我们压缩所有threes
(如三行)并使用is_spike
函数检查它们是否形成“尖峰”,这很简单。
答案 1 :(得分:0)
您可能需要仔细查看pandas
输入数据(出于测试目的,我在[col3 == 111.110]中添加了一行):
98015.985 -4922343.462 101.098
98015.985 -4922343.712 101.098
98015.985 -4922343.962 101.093
98015.985 -4922344.212 101.089
98015.985 -4922344.462 108.09
98015.985 -4922344.712 101.095
98015.985 -4922344.962 101.093
98015.985 -4922345.212 101.083
98015.985 -4922344.462 111.110
98015.985 -4922345.462 101.081
代码:
from __future__ import print_function
import pandas as pd
df = pd.read_csv('data.csv', sep=r'\s+', names=['col1','col2','col3'])
# print orginal data frame (for testing)
print(df)
# get spikes's coordinates
# df['col3'].shift(1) - previous value of the 'col3' column
# df['col3'].shift(-1) - next value of the 'col3' column
spikes = df.loc[(df['col3'] - df['col3'].shift(1) > 5) & (df['col3'] - df['col3'].shift(-1) > 5)]
# print and save spikes
print(spikes[['col1', 'col2']])
spikes[['col1', 'col2']].to_csv('spikes.csv', sep='\t', index=False)
输出:
col1 col2 col3
0 98015.985 -4922343.462 101.098
1 98015.985 -4922343.712 101.098
2 98015.985 -4922343.962 101.093
3 98015.985 -4922344.212 101.089
4 98015.985 -4922344.462 108.090
5 98015.985 -4922344.712 101.095
6 98015.985 -4922344.962 101.093
7 98015.985 -4922345.212 101.083
8 98015.985 -4922344.462 111.110
9 98015.985 -4922345.462 101.081
col1 col2
4 98015.985 -4922344.462
8 98015.985 -4922344.462
spikes.csv:
col1 col2
98015.985 -4922344.462
98015.985 -4922344.462