Python - 数据框,添加一列

时间:2017-10-07 06:00:32

标签: python

我有一个有100行的数据框。我想为这些行中的每一行分配一个唯一的编号(不是索引,而是基于业务逻辑的编号)。有一种方法可以分配唯一键(数字)。我无法单独分配值。需要一些帮助

Data Frame data is as follows

customer_key
825486
457347
641996
1006860
1078894

分配唯一ID的方法是

def getuniqid(data):
    from time import time
    skey_list = []
    for row in data.count()-1:
            skey_list.append(int(time()*10000000))
            return skey_list

我希望能够为所有单独的行分配唯一编号(生成唯一编号背后有业务逻辑,因为我正在做这个简单的int(时间)

感谢任何帮助。

由于

巴拉

2 个答案:

答案 0 :(得分:2)

我认为您需要按PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#> PREFIX strikes: <https://iisg.amsterdam/vocab/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?muni ?sdate (COUNT(?muni) as ?muniCount) WHERE { ?strike strikes:place ?splace . ?strike strikes:date ?sdate . ?muni rdf:type gg:Municipality . ?muni rdfs:label ?ggplace . FILTER regex(?splace, ?ggplace) ### TASK: Filter results above to strikes in 1970 only # solution 1: extract year and FILTER on 1970 # FILTER ( year(?sdate) = 1970 ) ### Virtuoso 22003 Error SR586: Incomplete RDF box as argument 0 for year(). # solution 2: filter on ?sdate # FILTER ( ?sdate >= '1970-01-01'^^xsd:date && ?sdate <= '1970-12-31'^^xsd:date ) ### Virtuoso 2201B Error SR098: regexp error at '? [Arnhem ( Gelderland )]' column 0 (nothing to repeat) ####### Why? This was no problem under solution 1 ?! ####### Also: note that each of these works seperately, but not together(!): # FILTER ( ?sdate >= '1970-01-01'^^xsd:date ) # FILTER ( ?sdate <= '1970-12-31'^^xsd:date ) } LIMIT 10 的长度创建的范围循环,然后在df循环之外获取return

for

或者可以通过def getuniqid(data): from time import time skey_list = [] for row in range(len(data)): skey_list.append(int(time()*10000000)) return skey_list data['new'] = getuniqid(data) 的某些列进行循环:

DataFrame
def getuniqid(data):
    from time import time
    skey_list = []
    for row in data['customer_key']:
            skey_list.append(int(time()*10000000))
    return skey_list

data['new'] = getuniqid(data)

答案 1 :(得分:1)

也许是这样的:

import time
import pandas as pd
from io import StringIO

string = u"""customer_key
825486
457347
641996
1006860
1078894"""

df = pd.read_csv(StringIO(string))

millisecondsnow = int(round(time.time() * 1000))
df["key"] = [millisecondsnow + i for i in range(len(df))]

输出:

    customer_key    key
0   825486  1507368278082
1   457347  1507368278083
2   641996  1507368278084
3   1006860 1507368278085
4   1078894 1507368278086

或者来自具有密钥的图书馆:

import uuid
import pandas as pd

string = u"""customer_key
825486
457347
641996
1006860
1078894"""

df = pd.read_csv(StringIO(string))

df["key"] = [uuid.uuid4() for _ in range(len(df))]