我有一个有100行的数据框。我想为这些行中的每一行分配一个唯一的编号(不是索引,而是基于业务逻辑的编号)。有一种方法可以分配唯一键(数字)。我无法单独分配值。需要一些帮助
Data Frame data is as follows
customer_key
825486
457347
641996
1006860
1078894
分配唯一ID的方法是
def getuniqid(data):
from time import time
skey_list = []
for row in data.count()-1:
skey_list.append(int(time()*10000000))
return skey_list
我希望能够为所有单独的行分配唯一编号(生成唯一编号背后有业务逻辑,因为我正在做这个简单的int(时间)
感谢任何帮助。
由于
巴拉
答案 0 :(得分:2)
我认为您需要按PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gg: <http://www.gemeentegeschiedenis.nl/gg-schema#>
PREFIX strikes: <https://iisg.amsterdam/vocab/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?muni ?sdate (COUNT(?muni) as ?muniCount)
WHERE {
?strike strikes:place ?splace .
?strike strikes:date ?sdate .
?muni rdf:type gg:Municipality .
?muni rdfs:label ?ggplace .
FILTER regex(?splace, ?ggplace)
### TASK: Filter results above to strikes in 1970 only
# solution 1: extract year and FILTER on 1970
# FILTER ( year(?sdate) = 1970 )
### Virtuoso 22003 Error SR586: Incomplete RDF box as argument 0 for year().
# solution 2: filter on ?sdate
# FILTER ( ?sdate >= '1970-01-01'^^xsd:date && ?sdate <= '1970-12-31'^^xsd:date )
### Virtuoso 2201B Error SR098: regexp error at '? [Arnhem ( Gelderland )]' column 0 (nothing to repeat)
####### Why? This was no problem under solution 1 ?!
####### Also: note that each of these works seperately, but not together(!):
# FILTER ( ?sdate >= '1970-01-01'^^xsd:date )
# FILTER ( ?sdate <= '1970-12-31'^^xsd:date )
}
LIMIT 10
的长度创建的范围循环,然后在df
循环之外获取return
:
for
或者可以通过def getuniqid(data):
from time import time
skey_list = []
for row in range(len(data)):
skey_list.append(int(time()*10000000))
return skey_list
data['new'] = getuniqid(data)
的某些列进行循环:
DataFrame
def getuniqid(data):
from time import time
skey_list = []
for row in data['customer_key']:
skey_list.append(int(time()*10000000))
return skey_list
data['new'] = getuniqid(data)
答案 1 :(得分:1)
也许是这样的:
import time
import pandas as pd
from io import StringIO
string = u"""customer_key
825486
457347
641996
1006860
1078894"""
df = pd.read_csv(StringIO(string))
millisecondsnow = int(round(time.time() * 1000))
df["key"] = [millisecondsnow + i for i in range(len(df))]
输出:
customer_key key
0 825486 1507368278082
1 457347 1507368278083
2 641996 1507368278084
3 1006860 1507368278085
4 1078894 1507368278086
或者来自具有密钥的图书馆:
import uuid
import pandas as pd
string = u"""customer_key
825486
457347
641996
1006860
1078894"""
df = pd.read_csv(StringIO(string))
df["key"] = [uuid.uuid4() for _ in range(len(df))]