Question

我有一个pyspark数据帧df，其中有两个现有列var newDoc = DocumentApp.openById('anotherGoogleID'); var newDocBody = newDoc.getBody(); var templateBody = DocumentApp.openById('aGoogleID').getActiveSection(); // has bullets var totalElements = templateBody.getNumChildren(); newDocBody.appendPageBreak(); for( var j = 0; j < totalElements; ++j ) { var element = otherBody.getChild(j).copy(); var type = element.getType(); if( type == DocumentApp.ElementType.PARAGRAPH ) newDocBody.appendParagraph(element); else if( type == DocumentApp.ElementType.TABLE ) newDocBody.appendTable(element); else if( type == DocumentApp.ElementType.LIST_ITEM ) { newDocBody.appendListItem(element); var glyphType = element.getGlyphType(); element.setGlyphType(glyphType); } else throw new Error("Unknown element type: "+type); } newDocBody.saveAndClose()和name，我想用随机值覆盖这些值。

对于列birthdate，我想要一个具有固定长度（例如10）的随机字母集合的字符串。每一行的字符串应随机化，以免所有行都使用相同的字符串。

对于列name，我想要一个格式为birthdate的字符串。我希望每一行在YYYY-MM-DD和1960-01-01之间有一个随机值。

我该如何实现？

Answer 1

您可以使用

创建随机字符串

''.join(random.choice(string.ascii_lowercase) for x in range(size))

和随机日期与

month = random.randint(1, 12)
str(random.randint(1960, 2018)) + '-' + str(month)+'-' + (str(random.randint(1, 28)) if month == 2 else str(random.randint(1, 30)) if month % 2 == 0 else str(random.randint(1, 31)))

别忘了import random和import string。

要创建具有数据框形状的数组，请创建一个大小相同的numpy.ndarray

import numpy as np
arr = np.ndarray(2, len(dataframe[0]))

然后通过循环为其提供正确的值

for y in range(len(dataframe[0])):
    arr[0, y] = ''.join(random.choice(string.ascii_lowercase) for x in range(size))
    month = random.randint(1, 12)
    arr[1, y] =str(random.randint(1960, 2018)) + '-' + str(month)+'-' + (str(random.randint(1, 28)) if month == 2 else str(random.randint(1, 30)) if month % 2 == 0 else str(random.randint(1, 31)))

将具有随机值的列添加到pyspark数据框

1 个答案: