Question

我完全迷失在以下方面。我有一个测试数据框填充了推文和元数据。现在，在某些条件下（比方说，我想选择所有转推），我想复制一行并将其写入新的CSV。

问题是，我不明白如何在Pandas中选择行，我查阅了文档，但它仍然让我感到困惑。我试过.loc和.ix，但我觉得我做错了。所以我的想法是添加rownumbers，然后使用计数器和.ix在这些rownumbers的基础上进行索引。因为我的索引是一个整数，我认为这可能有效：

selectRow = file_df.ix[counter,:]

除了它没有。有关如何选择整行的任何提示？我可能错过了一些非常简单的事情。

总代码： #Script接受推文并选择转推，将整行打印到新文件。

import pandas as pd
import string

print("Loading file & initializing variables.")

# load file
file_df = pd.read_csv("Desktop/tweetsamples.csv", delimiter=";")

#declare stuff we need to use
output_df = pd.DataFrame()
rowToCopy = pd.Series()
selectRow = pd.Series()
withoutPuncSeries = pd.Series()
counter = 0
retweet = False
username = ""

print("Working.. Please be patient.")

# define for loop which checks if there is a retweet in the tweet

content = file_df["header"] 

splitContent = [content.str.split()] #initialize list
for wordsLists in splitContent:
    counter = counter + 1
    for wordsList in wordsLists:
        if wordsList[0] == "RT":
            retweet = True
            username = wordsList[1]
            withoutPunctuation = "" #initialize/reset placeholder string
            for char in username: #we want to get rid of potential interpunction errors behind the username, so we loop through the string
                if char != "@": #we don't want to have the @
                    if char == "_" or char not in string.punctuation: #only desired characters ('_' is a valid char in an username)
                        withoutPunctuation = withoutPunctuation + char.lower() #add to placeholder string
            print "Found retweet from:", withoutPunctuation
            withoutPuncSeries = [withoutPunctuation]
            selectRow = file_df.ix[counter,:]

    rowToCopy = [selectRow, withoutPuncSeries]
    output_df = output_df.append(rowToCopy) 
    rowToCopy = pd.Series() #reset
    withoutPuncSeries = pd.Series()

output_df.to_csv("Desktop/retweet test.csv", sep=";")

print("Done.")

Answer 1

您可以选择df.iloc[row]的单行或df.iloc[startrow:endrow]的范围。在你的情况下，有一个额外的逗号似乎会产生问题。

Answer 2

如果要根据条件选择行，这样的事情应该有效。

def my_function(header):
    if header[0]=='RT': #or whatever your condition is
        return True
    else:
        return False


df_new = df[df['header'].apply(my_function)]
df_new.to_csv('../only_rt.csv')

Answer 3

我认为您正在寻找的是布尔屏蔽，问题是数据结构不是很清楚。 pandas有很多函数可以对字符串进行操作，例如contains，startswith，等等

retweet_df = file_df[file_df['header'].str.contains('RT') & ....]

布尔掩码可以包含通过逻辑运算符&（和），|（或）~（不是）

组合的多个语句

Python |选择pandas数据帧中的行

3 个答案: