我的数据如下所示:
Region X2012 X2013 X2014 X2015 X2016 X2017
1 1 10 11 12 13 14 15
2 2 NA 17 14 NA 23 NA
3 3 12 18 18 NA 23 NA
4 4 NA NA 15 28 NA 38
5 5 14 18.5 16 27 25 39
6 6 15 NA 17 27.5 NA 39
这里的数字无关紧要,但是我想做的是取每一行中最早观察点和最新观察点之间的差值,为不同之处建立新的列,其中:
Region Diff
1 (15 - 10) = 5
2 (23 - 17) = 6
等等,实际上不是显示减法,而是最终结果。理想情况下,我只是从2012列中减去2017列,但是由于任何行的第一个观察结果都可以在任何列开始,也可以在任何列结束,因此我不确定如何进行区别。
dplyr解决方案将是理想的,但任何解决方案都值得赞赏。答案 0 :(得分:5)
定义一个函数,该函数将其向量参数的最后一个减去第一个元素,从而省略NA,并将其应用于每一行。
lastMinusFirst <- function(x, y = na.omit(x)) tail(y, 1) - y[1]
transform(DF, diff = apply(DF[-1], 1, lastMinusFirst))
给予:
Region X2012 X2013 X2014 X2015 X2016 X2017 diff
1 1 10 11.0 12 13.0 14 15 5
2 2 NA 17.0 14 NA 23 NA 6
3 3 12 18.0 18 NA 23 NA 11
4 4 NA NA 15 28.0 NA 38 23
5 5 14 18.5 16 27.0 25 39 25
6 6 15 NA 17 27.5 NA 39 24
可复制形式的输入:
Lines <- "Region X2012 X2013 X2014 X2015 X2016 X2017
1 1 10 11 12 13 14 15
2 2 NA 17 14 NA 23 NA
3 3 12 18 18 NA 23 NA
4 4 NA NA 15 28 NA 38
5 5 14 18.5 16 27 25 39
6 6 NA NA NA NA NA NA"
DF <- read.table(text = Lines)
固定。
答案 1 :(得分:2)
一个整洁的答案。
此答案修改了G. Grothendieck的功能,并使用了purrr软件包中的Jenny Bryan's pmap method for row-wise calculations。
import requests
import multiprocessing
from concurrent import futures
def poll_data_1(data):
response = requests.get('https://breadcrumbscollector.tech/feed/')
print(f'Got data of length: {len(response.content)} in just {response.elapsed}')
def thread_set(data):
max_workers = 10
concurrent = futures.ThreadPoolExecutor(max_workers)
with concurrent as ex:
ex.map(poll_data_1, data)
data =range(40)
data1 =[]
for l in data:
data1.append([l])
# Mutliprocessing
with multiprocessing.Pool(processes=4, maxtasksperchild=1) as pool:
pool.imap_unordered(thread_set, data1)
pool.close()
pool.join()
给予:
class Question(models.Model):
id = models.AutoField(primary_key =True)
question = models.CharField(max_length = 250)
def __str__(self):
return self.question
答案 2 :(得分:1)
我们可以使用def fist(n)
k=2*n-2
for i in range(0,n):
for j in range(0,k):
k=k-1
print(end=" ')
for j in range(0,i+1):
print("*",end=" ")
print()
def second(n)
k=2*n-2
for i in range(0,n):
for j in range(0,k):
k=k-1
print(end=" ')
for j in range(0,i+1):
print("*",end=" ")
print()
def stem(m)
k=11
for i in range(0,5):
for j in range(0,k):
print(end=" ")
for j in range(0,3):
print("*",end=" ")
print()
first(7)
second(7)
steam(3)
作为其max.col
参数。我们用第一个非NA值减去每行中的最后一个非NA值。
ties.method
new_df <- !is.na(df[-1])
df$diff <- df[-1][cbind(seq_len(nrow(new_df)), max.col(new_df, ties.method = "last"))] -
df[-1][cbind(seq_len(nrow(new_df)), max.col(new_df, ties.method = "first"))]
df
# Region X2012 X2013 X2014 X2015 X2016 X2017 diff
#1 1 10 11.0 12 13.0 14 15 5
#2 2 NA 17.0 14 NA 23 NA 6
#3 3 12 18.0 18 NA 23 NA 11
#4 4 NA NA 15 28.0 NA 38 23
#5 5 14 18.5 16 27.0 25 39 25
#6 6 15 NA 17 27.5 NA 39 24
的答案可能是将tidyverse
数据转换为长格式,从而删除gather
值,并为每个NA
减去Region
last
value
个。
first