Question

我有一组来自Twitters API的推文使用R. 我将这些保存在.CSV文件中。每天一个。有些日子，我忘了运行脚本，所以我可能会有像

这样的东西

file2015-08-24 file2015-08-22

我想要更有条理。由于我错过了8-23，因此8-23的推文只存储在文件2015-08-24中。我想创建一个新的.CSV，并将“created_at”时间为08-23的任何推文移动到08-23文件，并将08-24中创建的推文留在08-24文件中。

按日期移动推文非常有效。但是，当我将推文转移到新文件时，我的CSV文件会被破坏。一些奇怪的逗号交互正在发生。

这是一个例子！ 7月31日，希拉里克林顿发了一条推文。在我的.CSV文件中，“text”列存储为

TEXT

RT @ TheBriefing2016：逐步取消医疗保险，废除ACA，以及建立投票障碍意味着只有一些人获得“崛起的权利”。

H＆安培;

请注意，以上都是单声道。我知道h＆amp;出现在第二行，其间有大量的空白区域，但在CSV格式中，这是一个单元格。但是，当我将此行重写为新的CSV文件时，我得到....

TEXT

RT @ TheBriefing2016：逐步淘汰医疗保险，废除ACA，并建立投票障碍意味着只有一些人才能获得升职权。那么我的CSV飞到了下一行，并将其列入第一列：h＆amp;“ 然后数据继续正常填充列，但是当我们跳到下一行时，它们显然位于“错误”列中，并从第1列开始。

如果我在Notepad ++中打开相同的两个文件来检查CSV ... 在Notepad ++中，推文出现时前几列正常，然后“text”列位于两条不同的行上。第一行是： “RT @ TheBriefing2016：逐步取消医疗保险，废除ACA，并建立投票障碍意味着只有一些人获得了”“崛起的权利。” “文字”的第二行： H＆安培;”，

当我打开文件时，我重写了它，它也在两行： “RT @ TheBriefing2016：逐步取消医疗保险，废除ACA，以及建立投票障碍意味着只有一些人获得了”崛起的权利“。 H＆安培;”，

我不确定是什么导致它在原始文件中正确显示，而不是在这里。这不是唯一能做到这一点的推文。其他一些带引号的人也可以。我觉得它从引号中突破了。

以下是我用来从一个文件转移到另一个文件的代码。

for(curFile in filenames){
## Read in the file
info = read.csv(curFile, header=TRUE, sep=",")

## Updated DF will hold what is in our original file, MINUS the rows that are getting removed.
updatedDF = info
## Get the file date
fileDate = curFile
fileDate = substr(fileDate, 78, 300)
#fileDate=substr(fileDate,85,300)
fileDate = substr(fileDate, 0, 10)

## Get the header from the file
header = names(info)

## Figure out how many rows of data we have
## This is the number of tweets we have in this data file
numTweets = dim(info)[1]

## For every tweet, starting with tweet #1, up to the last tweet (numTweets)
for( x in 1:numTweets) {
    ## Get the tweets date
    ## We want to get this as a VECTOR so we can do character / string manipulations on it
    tweetDateLine = as.vector(info[x, "created_at"])

    ### To get the date from the file, we are going to need to do some editting to the string
    year = substr(tweetDateLine, nchar(tweetDateLine)-3, 300)
    monthDay = substr(tweetDateLine, 5, 10)

    ### Strip the white space from these
    year = gsub(" ", "", year)
    monthDay = gsub(" ", "", monthDay)

    ### Put them together for a cohesive MMMDDYYYY
    tweetDate = paste(monthDay, year, sep="")

    ### Finally convert this to YYYY-MM-DD format like our original date has as extracted from the file name
    tweetDate = as.Date(tweetDate, "%B%d%Y")

    ### Now we can compare
    ### Make a boolean variable. If it is TRUE they are the same
    isTheSame = (fileDate == tweetDate)

    ### If the date of the tweet and the date of the file are the same...
    if(isTheSame){
        ### Skip to the next tweet
        next
    } ## if(isTheSame){

    ### If the date of the tweet and the date of the file are not the same...
    else{
        ### See if a file exists for the date of that tweet. 
        ### First, construct the file name with the path + the date + .csv
        potentialFileName = paste(path, tweetDate, ".csv", sep="")

        ### Next, see if it exists!
        fileExists = file.exists(potentialFileName)

        ### If the file already exists...
        if(fileExists){

            ### Now we need to add the data
            ### To get row "x" of the data...
            entireRow = info[x,]

            ### Now append the row to that file
            cat(sprintf("Writing tweet to file!\n"))
            write.table(rbind(entireRow),file=potentialFileName,row.names=FALSE,col.names=FALSE,sep=",",append=TRUE)                

            ### Delete this line from the original file
            updatedDF = updatedDF[updatedDF$created_at != tweetDateLine, ]
        } ##if(fileExists){

        ### If the file does not already exist
        else{


            ### Create the file
            cat(sprintf("Creating file for date : %s \n", tweetDate))
            file.create(potentialFileName)

            ### Add the header line
            cat(sprintf("Inserting header!\n"))
            write.table(rbind(header), file=potentialFileName, row.names=FALSE, col.names=FALSE, sep=",")

            ### Now we need to add the data
            ### To get row "x" of the data...
            entireRow = info[x,]

            ### Now append the row to that file
            cat(sprintf("Writing tweet to file!\n"))
            write.table(rbind(entireRow),file=potentialFileName,row.names=FALSE,col.names=FALSE,sep=",",append=TRUE)

            ### Delete this line from the original file
            updatedDF = updatedDF[updatedDF$created_at != tweetDateLine,]
        } ## else{

    } ## else{

}##for( x in 1:numTweets) {

# Now we must take the updatedDF, which contains the original CSV minus the deleted lines
# And write it back to the original file
# Start with replacing the header
cat(sprintf("Inserting header!\n"))
write.table(rbind(header), file=curFile, row.names=FALSE, col.names=FALSE, sep=",")

# Now print the dataframe back
cat(sprintf("Inserting dataframe!\n"))
write.table(updatedDF, file=curFile, row.names=FALSE, col.names = FALSE, sep=",", append=TRUE)

} ## for（curFile in fileNames）{

为了进一步帮助大家：http://imgur.com/a/9xwx5这是我在Excel / NPP中查看原文的视图，然后是将推文移动到新文件之后。

如果它也帮助推动这一点的推文（好吧，其中一个。有几个。） - ＆gt;是这篇推文的ReTweet。 https://twitter.com/TheBriefing2016/status/627212836339453952

R - 从一个CSV写入另一个CSV - 奇怪的逗号交互

TEXT

TEXT

0 个答案: