使用熊猫合并tsv文件时遇到问题

时间:2020-02-15 01:42:41

标签: python pandas csv dataframe

所以我在使用熊猫时遇到了一个问题,即它根本无法合并某些行。例如,当尝试将以下两个摘录合并在一起时:

Haggai  1:1 In the second year of Darius the king, in the sixth month, in the first day of the month, the Word of Yahweh came by Haggai, the prophet, to Zerubbabel, the son of Shealtiel, governor of Judah, and to Joshua, the son of Jehozadak, the high priest, saying,
Haggai  1:2 "This is what Yahweh of Hosts says: These people say, 'The time hasn't yet come, the time for Yahweh's house to be built.'"
Haggai  1:3 Then the Word of Yahweh came by Haggai, the prophet, saying,
Haggai  1:4 "Is it a time for you yourselves to dwell in your paneled houses, while this house lies waste?
Haggai  1:5 Now therefore this is what Yahweh of Hosts says: Consider your ways.
Haggai  1:6 You have sown much, and bring in little. You eat, but you don't have enough. You drink, but you aren't filled with drink. You clothe yourselves, but no one is warm, and he who earns wages earns wages to put them into a bag with holes in it."

Haggai  1:1 ΕΝ τῷ δευτέρῳ ἔτει ἐπὶ Δαρίου τοῦ βασιλέως ἐν τῷ μηνὶ τῷ ἕκτῳ μιᾷ τοῦ μηνὸς ἐγένετο λόγος Κυρίου ἐν χειρὶ Ἀγγαίου τοῦ προφήτου λέγων Εἰπὸν πρὸς Ζοροβάβελ τὸν τοῦ Σαλαθιὴλ ἐκ φυλῆς Ἰούδα καὶ πρὸς Ἰησοῦν τὸν τοῦ Ἰωσεδὲκ τὸν ἱερέα τὸν μέγαν λέγων
Haggai  1:2 Τάδε λέγει Κύριος Παντοκράτωρ λέγων Ὁ λαὸς οὗτος λέγουσιν Οὐχ ἤκει ὁ καιρὸς τοῦ οἰκοδομῆσαι τὸν οἶκον Κυρίου.
Haggai  1:3 καὶ ἐγένετο λόγος Κυρίου ἐν χειρὶ Ἀγγαίου τοῦ προφήτου λέγων
Haggai  1:4 Εἰ καιρὸς μέν ὑμῖν ἐστιν τοῦ οἰκεῖν ἐν οἴκοις ὑμῶν κοιλοστάθμοις, ὁ δὲ οἶκος ὑμῶν ἐξηρήμωται;
Haggai  1:5 καὶ νῦν τάδε λέγει Κύριος Παντοκράτωρ Τάξατε δὴ τὰς καρδίας ὑμῶν εἰς τὰς ὁδοὺς ὑμῶν·
Haggai  1:6 ἐσπείρατε πολλὰ καὶ εἰσηνέγκατε ὀλίγα, ἐφάγετε καὶ οὐκ εἰς πλησμονήν, ἐπίετε καὶ οὐκ εἰς μέθην, περιεβάλεσθε καὶ οὐκ ἐθερμάνθητε ἐν αὐτοῖς, καὶ ὁ τοὺς μισθοὺς συνάγων συνήγαγεν εἰς δεσμὸν τετρυπημένον.

我得到:

21  Haggai  1:1 ΕΝ τῷ δευτέρῳ ἔτει ἐπὶ Δαρίου τοῦ βασιλέως ἐν τῷ μηνὶ τῷ ἕκτῳ μιᾷ τοῦ μηνὸς ἐγένετο λόγος Κυρίου ἐν χειρὶ Ἀγγαίου τοῦ προφήτου λέγων Εἰπὸν πρὸς Ζοροβάβελ τὸν τοῦ Σαλαθιὴλ ἐκ φυλῆς Ἰούδα καὶ πρὸς Ἰησοῦν τὸν τοῦ Ἰωσεδὲκ τὸν ἱερέα τὸν μέγαν λέγων In the second year of Darius the king, in the sixth month, in the first day of the month, the Word of Yahweh came by Haggai, the prophet, to Zerubbabel, the son of Shealtiel, governor of Judah, and to Joshua, the son of Jehozadak, the high priest, saying,
22  Haggai  1:2 Τάδε λέγει Κύριος Παντοκράτωρ λέγων Ὁ λαὸς οὗτος λέγουσιν Οὐχ ἤκει ὁ καιρὸς τοῦ οἰκοδομῆσαι τὸν οἶκον Κυρίου.   This is what Yahweh of Hosts says: These people say, 'The time hasn't yet come, the time for Yahweh's house to be built.'
23  Haggai  1:3 καὶ ἐγένετο λόγος Κυρίου ἐν χειρὶ Ἀγγαίου τοῦ προφήτου λέγων    Then the Word of Yahweh came by Haggai, the prophet, saying,
24  Haggai  1:4 Εἰ καιρὸς μέν ὑμῖν ἐστιν τοῦ οἰκεῖν ἐν οἴκοις ὑμῶν κοιλοστάθμοις, ὁ δὲ οἶκος ὑμῶν ἐξηρήμωται;   "Is it a time for you yourselves to dwell in your paneled houses, while this house lies waste?
Haggai  1:5 Now therefore this is what Yahweh of Hosts says: Consider your ways.
Haggai  1:6 You have sown much, and bring in little. You eat, but you don't have enough. You drink, but you aren't filled with drink. You clothe yourselves, but no one is warm, and he who earns wages earns wages to put them into a bag with holes in it."

Haggai 1:5Haggai 1:6显然不能正确合并的地方。

我使用的代码是:

import pandas as pd

df1 = pd.read_table('greekBible.txt')
df2 = pd.read_table('englishBible.txt')

df3 = pd.merge(df1, df2, on=['Book', 'Chapter:Verse'])

df3.to_csv('test.txt', sep="\t")

请记住,这只是一小部分摘录。此外,这两个圣经不是完全对齐的–一个条目中的条目不在另一个条目中,反之亦然。不过,据我了解,这应该不是问题。

非常感谢您对这个问题的帮助!

1 个答案:

答案 0 :(得分:1)

文件中似乎缺少引号,这可能是由于文本的拆分方式所致。哈加1:4这行上最明显的是:“难道这是您自己的时间住在这间镶板房屋中,而这所房屋却是废物吗?