如何在pandas数据框中添加一行,包含所有列的总和,并按最高值

时间:2018-03-26 22:47:19

标签: python python-3.x pandas dataframe

我有这么大的代码:

# entradas (10 artigos)
abst1 = "This letter to the editor of Planning Magazine was written in response to an article published in the June 2016 issue of the magazine entitled Research You Can Use-The five most-cited planning researchers. To this day the letter has remained unaddressed by the editor. Therefore, I have taken the liberty to submit it to the Journal of Health & Medical Informatics, since some of the so called most-cited authors have also published on health-related topics, and the points I make in my letter below are valid across multiple disciplinary fields."
abst2 = "Citizens perceptions in this analysis suggest a number of barriers to health-care access and utilization across the eight African countries studied. On average, across the 8 countries, Afrobarometer fieldworkers did not find health facilities in about 40% of all the enumeration areas (EA) included in the survey. Generally, across the 8 countries, about 34% of the respondents reported that they could not have contact for health care when there needed it. Among those who accessed health care during the previous year, a cumulative 31% found it difficult or very difficult to get the care they needed. Averagely, only 14% of all the respondents surveyed indicated that they obtained their needed medical care  right way implying that a significant proportion of people wasted longer time prior to receiving their health care needs. Close to half (46%) of citizens in the eight countries noted that their governments were performing fairly badly or very badly as far as improving basic health services were concerned with only 10% of the citizens said their government were performing very well. In all the 8 countries tracked since 2005, negative evaluations of their governments have increased by about 13 percentage points over the past decade. Therefore, governments in the African sub-region need to enhance their efforts to promote accessibility to basic health care services which is a fundamental human right enshrined in the Alma-Ata Declaration."
abst3 = "Word of mouth publicity has been casting substantial impact on hospitals business. This has become even more impactful with increasing use of online rating and reviews. A lower average rating can potentially affect the hospitals business negatively. Average rating gets considerably lowered with customers giving least rating to a hospital. This study attempts at identifying components that leads a customer to give least rating to a hospital. The study analyses 669 descriptive reviews accompanying a rating by qualitatively analysing and grouping them in component of dissatisfactions (CoD). Each CoD was then tested for their association with least online rating to identify significant ones. Out of 5 CoD, 3 were found significant (Medical Care, Conduct and Money making attitude) while remaining 2 were not (System and facilities and Expensiveness). Amongst CoD that were found significant, no significant difference was found in between them in their strength of association with least online rating."
abst4 = "Healthcare applications of mobile phones are steadily gaining popularity over last few years. With the increasing penetration of mobile networks in remote rural villages in India, mobile phones are becoming an important tool for enhancing doctor-patient communication. The mobile technology is increasingly enhancing functionalities of handheld devices, smart phones and PDAs, which are potentially replacing the use of PC, based alternatives while supporting mobility needs of patients and medical practitioners. The study looks at the facilitators of them Health program (mobile based reporting by Front Line Workers in public health) in Saharsa district of Bihar state in India. Data was collected in July 2017. 109 FLWs were contacted for the study. The results tell us that education plays an important role in mobile based reporting system. Age of the respondents does not relate to the mobile phone based reporting system. The willingness to use the mobile phone for reporting is at 80% taken overall. Across age group the worry regarding the lack of hard copy of data is very less and stood at overall 40% level. Respondents from all age group indicated that the timing of the reporting was appropriate and it stood at 85%. The logistic regression tells us that appropriateness of the reporting time increases by 4.1 times, when the FLW receives the needed service from the mobile platform. Effectiveness of the work of the FLW would decrease by 89% if she is worried about the lack of hard copy of the data. Willingness to continue the mobile phone for reporting would increase by 4.34 times, for those FLW, who are recommending the program to continue. Appropriateness of the reporting time increases by 5.26 times, when the FLW recommends continuing the program. The mobile based reporting system generates the real time data. The same can be used to create dashboards for the Sustainable Development Goals (SDG). The tracking can be created at state or national level. Even the district level dashboards can be created. The data from the front line workers will get used on real time basis. The action on certain aspect of the program can be very quick."
abst5 = "Context: Hypoventilation and apnea after epidural morphine is a serious concern after surgery and an issue in chronic pain. A low dose of naloxone added to morphine can prevent this complication. Objective: To determine that the low dose of naloxone added to epidural morphine analgesic could change the effect of this opioid in chronic low back pain. In addition, we evaluate its effect on respiratory function and patient satisfaction. Patients: Twenty-seven adults suffering from chronic low back pain (LBP) who were candidates for epidural injection treatment. Intervention: This was a randomized double-blind, uniform crossover, controlled clinical trial. The patients were treated with mixture of morphine-bupivacaine and mixture of morphine-bupivacaine-naloxone. Main outcome measure: The primary goals were to evaluate pain intensity and respiratory function after epidural injection of morphine or morphine combined with naloxone. Secondary end-points were the incidence and the side effects (pruritus, nausea, vomiting, and urinary retention) of neuraxial injection of morphine or morphine combined with naloxone for 14 days after each epidural injection. Results: There was no significant difference between morphine and morphine combined with naloxone on mean peripheral capillary oxygen saturation (SpO2m), the lowest peripheral capillary oxygen saturation (SpO2), and the respiratory disturbance index (RDI). Morphine combined with naloxone seemed to decrease pain more than morphine alone, but the result was not significant (p=0.2116). In the group that received morphine and naloxone, pain decreased sooner by half from baseline pain (at day 2 versus at day 6) than the other group. Vomiting, pruritus, and urinary retention were seen with no significant difference in both groups. Conclusion: We conclude that epidural administration of naloxone can preserve the analgesic effect of morphine in treatment of chronic LBP. Naloxone does not have any effects on respiratory function. It reduces itching, nausea, and pruritus after epidural injection of morphine. We cannot be certain whether this is the ideal dose or whether any changes in the doses might produce fewer side effects without interfering with analgesia."
abst6 = "The author attempts to postulate a new volumetric complaints and adverse events trending method for holistic risk based statistical trending using a new control chart which is based on the theory of a U chart, as it pertains to medical device failure related adverse events. Mathematical rationale is provided for various control chart parameters like subgroup size and subgroup frequency and correlations to existing literature have been made to justify conclusions. Also, the article features discussion on false alarms and how to use Minitab to minimize them while monitoring incoming complaint variability using a U chart and consequently, the J chart."
abst7 = "Interactive health games (IHG) are fun, experimental, challenging as well as powerful tools that have the potential to change the patients behaviour, attitudes and improve their health. IHG involve well-designed intent challenges and follow roles moving toward a goal. In the last decade, a wide variety of IHG has been developed. The classification of these games is based on game subject, health subject and player subject. Along with the potential future utilization of IHG, there are limitations to a wider application such as the disparity in the players cognitive abilities and the need to identify the kinds of learning and training support."
abst8 = "Current Virtual Reality (VR) applications in healthcare demonstrate potential abilities to address cognitive, psychological, motor, functional impairments and opportunities for training and education of clinical practitioners. Bearing in mind the overall wellness of their communities, healthcare officials had supported the idea of incorporating modern technology by increasing the budget shares and arranging for an access to advanced equipment and professional expertise. Clinicians are becoming more interested in applying VR simulation into their research and clinical trials because of the encouraging feedback published in the medical literature across a wide range of clinical health conditions. Numerous published articles propose novel concepts on applications VR technologies and their potential on disease prevention and management. Finally, the ability of sharing data collected by VR simulation systems through communication networks and electronic health records make it more attractive for the reason that it plays a role in decision making for specific case studies and distance learning."

# transforma strings em arrays
arabst1 = abst1.split(" ")
arabst2 = abst2.split(" ")
arabst3 = abst3.split(" ")
arabst4 = abst4.split(" ")
arabst5 = abst5.split(" ")
arabst6 = abst6.split(" ")
arabst7 = abst7.split(" ")
arabst8 = abst8.split(" ")

# remove stop words
from nltk.corpus import stopwords
swarabst1 = [word for word in arabst1 if word not in stopwords.words('english')]
swarabst2 = [word for word in arabst2 if word not in stopwords.words('english')]
swarabst3 = [word for word in arabst3 if word not in stopwords.words('english')]
swarabst4 = [word for word in arabst4 if word not in stopwords.words('english')]
swarabst5 = [word for word in arabst5 if word not in stopwords.words('english')]
swarabst6 = [word for word in arabst6 if word not in stopwords.words('english')]
swarabst7 = [word for word in arabst7 if word not in stopwords.words('english')]
swarabst8 = [word for word in arabst8 if word not in stopwords.words('english')]

# transformar texto sem stop words em steam words
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
ps = PorterStemmer()

starabst1 = [""]
for w in arabst1:
    starabst1.append(ps.stem(w))

starabst2 = [""]
for w in arabst2:
    starabst2.append(ps.stem(w))

starabst3 = [""]
for w in arabst3:
    starabst3.append(ps.stem(w))

starabst4 = [""]
for w in arabst4:
    starabst4.append(ps.stem(w))

starabst5 = [""]
for w in arabst5:
    starabst5.append(ps.stem(w))

starabst6 = [""]
for w in arabst6:
    starabst6.append(ps.stem(w))

starabst7 = [""]
for w in arabst7:
    starabst7.append(ps.stem(w))

starabst7 = [""]
for w in arabst7:
    starabst7.append(ps.stem(w))

starabst8 = [""]
for w in arabst8:
    starabst8.append(ps.stem(w))

# calcular tf-idf
    #unir todas as listas de steam words
uniao = set(starabst1).union(set(starabst2).union(set(starabst3).union(set(starabst4).union(set(starabst5).union(set(starabst6).union(set(starabst7).union(set(starabst8))))))))

    #fazer todos os dicionários
dicionario1 = dict.fromkeys(uniao, 0)
for word in starabst1:
    dicionario1[word]+=1

dicionario2 = dict.fromkeys(uniao, 0)
for word in starabst2:
    dicionario2[word]+=1

dicionario3 = dict.fromkeys(uniao, 0)
for word in starabst3:
    dicionario3[word]+=1

dicionario4 = dict.fromkeys(uniao, 0)
for word in starabst4:
    dicionario4[word]+=1

dicionario5 = dict.fromkeys(uniao, 0)
for word in starabst5:
    dicionario5[word]+=1

dicionario6 = dict.fromkeys(uniao, 0)
for word in starabst6:
    dicionario6[word]+=1

dicionario7 = dict.fromkeys(uniao, 0)
for word in starabst7:
    dicionario7[word]+=1

dicionario8 = dict.fromkeys(uniao, 0)
for word in starabst8:
    dicionario8[word]+=1

#função que calcula o TF

def calcularTF(dicionario, array):
    tfdicionario = {}
    arrayCont = len(array)
    for word, count in dicionario.items():
        tfdicionario[word] = count / float(arrayCont)
    return tfdicionario

# calcular o TF de todos os dicionários

tfdicionario1 = calcularTF(dicionario1, starabst1)
tfdicionario2 = calcularTF(dicionario2, starabst2)
tfdicionario3 = calcularTF(dicionario3, starabst3)
tfdicionario4 = calcularTF(dicionario4, starabst4)
tfdicionario5 = calcularTF(dicionario5, starabst5)
tfdicionario6 = calcularTF(dicionario6, starabst6)
tfdicionario7 = calcularTF(dicionario7, starabst7)
tfdicionario8 = calcularTF(dicionario8, starabst8)


    # calcular IDF
def calcularIDF(texto):
    import math
    dicionarioidf = {}
    N = len(texto)

    dicionarioidf = dict.fromkeys(texto[0].keys(),0)
    for doc in texto:
        for word, val in doc.items():
            if val > 0:
                dicionarioidf[word] += 1

    for word, val in dicionarioidf.items():
        dicionarioidf[word] = math.log(N/float(val))

    return dicionarioidf

idfs = calcularIDF([dicionario1,dicionario2,dicionario3,dicionario4,dicionario5,dicionario6,dicionario7,dicionario8])

# calcular TF-IDF
def calcularTFIDF(tf, idf):
    tfidf = {}
    for word, val in tf.items():
        tfidf[word] = val * idf[word]

    return tfidf

tfidfdicionario1 = calcularTFIDF(tfdicionario1, idfs)
tfidfdicionario2 = calcularTFIDF(tfdicionario2, idfs)
tfidfdicionario3 = calcularTFIDF(tfdicionario3, idfs)
tfidfdicionario4 = calcularTFIDF(tfdicionario4, idfs)
tfidfdicionario5 = calcularTFIDF(tfdicionario5, idfs)
tfidfdicionario6 = calcularTFIDF(tfdicionario6, idfs)
tfidfdicionario7 = calcularTFIDF(tfdicionario7, idfs)
tfidfdicionario8 = calcularTFIDF(tfdicionario8, idfs)

# mostrar tabela com os resultados
import pandas as pd
pd.DataFrame([tfidfdicionario1,tfidfdicionario2,tfidfdicionario3,tfidfdicionario4,tfidfdicionario5,tfidfdicionario6,tfidfdicionario7,tfidfdicionario8])

作为最后的结果,我得到了类似的东西: Final output

我需要的是按较高的值对行进行排序 和 在底部添加一行以及所有列值的总和,以便识别最常见的单词。

这是我第一次使用python。

谢谢大家。

它一直说我的帖子主要是代码所以我会写一些随机的东西,所以最后把问题放在这里...

1 个答案:

答案 0 :(得分:0)

为输出DataFrame df命名,然后尝试:

# Create a row of column-wise sums
df.loc['colsum'] = df.sum(axis=0)

# Sort columns in descending order of sum
df.sort_values('colsum', axis=1, ascending=False, inplace=True)

# Sort rows in descending order of the most common letter
df.sort_values(df.columns.tolist(), axis=0, ascending=False, inplace=True)

# View the top few words; note the DF index retains the original order.
# To throw away original order of rows, run df.reset_index(inplace=True)
df.iloc[:, :7]

            rate   morphin    letter     mobil    chart      game     least
colsum  0.095764  0.083178  0.068553  0.064798  0.06116  0.060566  0.054722
2       0.095764  0.000000  0.000000  0.000000  0.00000  0.000000  0.054722
4       0.000000  0.083178  0.000000  0.000000  0.00000  0.000000  0.000000
0       0.000000  0.000000  0.068553  0.000000  0.00000  0.000000  0.000000
3       0.000000  0.000000  0.000000  0.064798  0.00000  0.000000  0.000000
5       0.000000  0.000000  0.000000  0.000000  0.06116  0.000000  0.000000
6       0.000000  0.000000  0.000000  0.000000  0.00000  0.060566  0.000000
1       0.000000  0.000000  0.000000  0.000000  0.00000  0.000000  0.000000
7       0.000000  0.000000  0.000000  0.000000  0.00000  0.000000  0.000000