何时使用swapcase两次不返回相同的答案?

时间:2013-12-20 05:32:20

标签: python

str.swapcase()的Python文档说:

  

请注意s.swapcase().swapcase() == s

并不一定正确

我猜这与Unicode有关;但是,在swapcase()的两次应用之后,我无法生成一个更改的字符串。什么样的字符串不能产生相同的结果?

这是我测试的(获得here):

>>> testString = '''Bãcoл ípѕüϻ Ꮷ߀ɭor sìt ämét qûìs àɭïɋüíp cülρä, ϻagnâ èх ѕêԁ ѕtríρ stêãk iл ԁò ut sålámí éхèrcìtátïoл pòrƙ ɭ߀in. Téԉԁërɭ߀ín tùrkèϒ ѕáûsáɢè лùɭɭå pɑrïátûr, ƃáll típ âԁiρïѕicïԉǥ ɑᏧ c߀ԉsêquät ϻâgлã véлïsoл. Míлím àutë ѵ߀ɭüρtåte mòɭɭít tri-tíρ dèsêrùԉt. Occãècát vëԉis߀ԉ êХ eiùѕm߀d séᏧ láborüϻ pòrƙ lòïл àliɋûå ìлcíԁìԁúԉt. Sed còmϻ߀Ꮷ߀ յoɰl offícíä pòrƙ ƅèɭly témρòr lâƅòrùϻ tâiɭ sρårê ríbs toлǥue ϻêátɭòáf måɢnä.

Kièɭbàѕã in còлѕêctêtur ѵëлíàϻ pâríɑtùr p߀rk ɭ߀in êxêrcìtâtiòл älìɋúíρ câρicolɑ ρork tòлɢüê düis ԁ߀ɭoré rêpréhéԉᏧérït. Tènԁèrloiԉ ëх rèρréհeԉԁérït fûgíãt ädipìsiciԉg gr߀ünᏧ roúлd, ƅaɭɭ típ հàϻƃûrǥèr ѕɦòùlder ɭåb߀rûϻ têmρor ríƃêyë. Eѕsè hàϻ ѵëԉiam, åɭíɋùɑ ìrüre ρòrƙ cɦop ԁò ԁ߀ɭoré frânkfürter nülla påsträϻí sàusàgè sèᏧ. Eӽcêptêür ѕëd t-b߀лë հɑϻ, esѕë ut ɭàƅoríѕ ƃáll tíρ nostrúԁ sհ߀üldêr ïn shòrt ríƅs ρástrámï. Essé hamƅûrǥër ɭäƅòré, fatƃàcƙ teԉderlòïn sհ߀rt rïbs ρròìdént riƅêye ɭab߀rum. Nullɑ türԁùcƙèn л߀n, sρarè rìƅs eӽceρteur ádïρìѕìcïԉǥ êt ѕɦort ɭòin dolorë änïm dêѕêrùлt. Sհäлƙlè cúpïԁätát pork lòïn méåtbäll, ԉ߀strud réprèհéԉԁêrìt ɦɑϻburǥêr ѕâɭɑϻí Ꮷol߀rè ɑd lêberƙãs.

Boûdiл toлǥuê c߀ԉsèqûåt eà rümρ ƅálɭ tíρ ѕρâré rìbѕ ín pròiᏧent dûiѕ ϻíлïm èíuѕmòᏧ c߀rԉêᏧ ƃèèf ƅɑc߀л d߀lorè. Cornèd ƅëèf drûmsticƙ cùlpa, éлïm baɭɭ tìp ϻéatbâlɭ lab߀rê tri-tïp vënisoԉ ǥroùԉԁ ròùлԁ հɑm iл èä bãcòn. Eѕѕé ìᏧ ѕúԉt, sհoùldér ƙïeɭƃäѕà ãԁiρisïcïԉɢ ɦaϻbûrgêr út ԁòɭ߀re fåtbäcƙ ԁ߀ɭòr äлïm trï-típ. EíùsϻòᏧ nülɭã läbòruϻ лíѕi êxcéptèúr. Occåécåt Ꮷüíѕ ԁèserüлt toԉǥue ϳ߀wɭ. Rèρréɦëԉԁêrit áɭïqúíp fûǥiàt tùrkey véniãϻ qüìѕ.'''
>>> testString.swapcase().swapcase() == testString
True

3 个答案:

答案 0 :(得分:19)

当多个字母是同一个字母的小写字母时就是这种情况。

例如,微字符µ (U+00B5)和mu字符μ (U+03BC)

>>> u'\xb5'.swapcase()
u'\u039c'
>>> u'\u03bc'.swapcase()
u'\u039c'

这两个是不同的字符,但它们的大写字母是相同的。这意味着当应用str.swapcase()时,它们会返回相同的字符。但是,再次执行此操作不能(也不会)返回两个字母。

>>> u'\xb5'.swapcase().swapcase()
u'\u03bc'

答案 1 :(得分:9)

虽然Volatility提出了将大写mu和大写微解析为同一Unicode码点的示例,但另一个有趣的情况是,应用swapcase两次导致不同的答案:

>>> 'ß'.swapcase().swapcase()
'ss'

困惑?德语辅音ß(发音为[s])在SS一次应用后变为swapcase,然后在第二次应用后变为ss

这里是他们的整个列表(→代表一个swapcase):

µ (0xb5) → Μ (0x39c) → μ (0x3bc) → Μ (0x39c)
ß (0xdf) → SS (0x5353) → ss (0x7373) → SS (0x5353)
İ (0x130) → i̇ (0x69307) → İ (0x49307) → i̇ (0x69307)
ı (0x131) → I (0x49) → i (0x69) → I (0x49)
ʼn (0x149) → ʼN (0x2bc4e) → ʼn (0x2bc6e) → ʼN (0x2bc4e)
ſ (0x17f) → S (0x53) → s (0x73) → S (0x53)
ǰ (0x1f0) → J̌ (0x4a30c) → ǰ (0x6a30c) → J̌ (0x4a30c)
ͅ (0x345) → Ι (0x399) → ι (0x3b9) → Ι (0x399)
ΐ (0x390) → Ϊ́ (0x399308301) → ΐ (0x3b9308301) → Ϊ́ (0x399308301)
ΰ (0x3b0) → Ϋ́ (0x3a5308301) → ΰ (0x3c5308301) → Ϋ́ (0x3a5308301)
ς (0x3c2) → Σ (0x3a3) → σ (0x3c3) → Σ (0x3a3)
ϐ (0x3d0) → Β (0x392) → β (0x3b2) → Β (0x392)
ϑ (0x3d1) → Θ (0x398) → θ (0x3b8) → Θ (0x398)
ϕ (0x3d5) → Φ (0x3a6) → φ (0x3c6) → Φ (0x3a6)
ϖ (0x3d6) → Π (0x3a0) → π (0x3c0) → Π (0x3a0)
ϰ (0x3f0) → Κ (0x39a) → κ (0x3ba) → Κ (0x39a)
ϱ (0x3f1) → Ρ (0x3a1) → ρ (0x3c1) → Ρ (0x3a1)
ϴ (0x3f4) → θ (0x3b8) → Θ (0x398) → θ (0x3b8)
ϵ (0x3f5) → Ε (0x395) → ε (0x3b5) → Ε (0x395)
և (0x587) → ԵՒ (0x535552) → եւ (0x565582) → ԵՒ (0x535552)
ẖ (0x1e96) → H̱ (0x48331) → ẖ (0x68331) → H̱ (0x48331)
ẗ (0x1e97) → T̈ (0x54308) → ẗ (0x74308) → T̈ (0x54308)
ẘ (0x1e98) → W̊ (0x5730a) → ẘ (0x7730a) → W̊ (0x5730a)
ẙ (0x1e99) → Y̊ (0x5930a) → ẙ (0x7930a) → Y̊ (0x5930a)
ẚ (0x1e9a) → Aʾ (0x412be) → aʾ (0x612be) → Aʾ (0x412be)
ẛ (0x1e9b) → Ṡ (0x1e60) → ṡ (0x1e61) → Ṡ (0x1e60)
ẞ (0x1e9e) → ß (0xdf) → SS (0x5353) → ss (0x7373) → SS (0x5353)
ὐ (0x1f50) → Υ̓ (0x3a5313) → ὐ (0x3c5313) → Υ̓ (0x3a5313)
ὒ (0x1f52) → Υ̓̀ (0x3a5313300) → ὒ (0x3c5313300) → Υ̓̀ (0x3a5313300)
ὔ (0x1f54) → Υ̓́ (0x3a5313301) → ὔ (0x3c5313301) → Υ̓́ (0x3a5313301)
ὖ (0x1f56) → Υ̓͂ (0x3a5313342) → ὖ (0x3c5313342) → Υ̓͂ (0x3a5313342)
ᾀ (0x1f80) → ἈΙ (0x1f08399) → ἀι (0x1f003b9) → ἈΙ (0x1f08399)
ᾁ (0x1f81) → ἉΙ (0x1f09399) → ἁι (0x1f013b9) → ἉΙ (0x1f09399)
ᾂ (0x1f82) → ἊΙ (0x1f0a399) → ἂι (0x1f023b9) → ἊΙ (0x1f0a399)
ᾃ (0x1f83) → ἋΙ (0x1f0b399) → ἃι (0x1f033b9) → ἋΙ (0x1f0b399)
ᾄ (0x1f84) → ἌΙ (0x1f0c399) → ἄι (0x1f043b9) → ἌΙ (0x1f0c399)
ᾅ (0x1f85) → ἍΙ (0x1f0d399) → ἅι (0x1f053b9) → ἍΙ (0x1f0d399)
ᾆ (0x1f86) → ἎΙ (0x1f0e399) → ἆι (0x1f063b9) → ἎΙ (0x1f0e399)
ᾇ (0x1f87) → ἏΙ (0x1f0f399) → ἇι (0x1f073b9) → ἏΙ (0x1f0f399)
ᾐ (0x1f90) → ἨΙ (0x1f28399) → ἠι (0x1f203b9) → ἨΙ (0x1f28399)
ᾑ (0x1f91) → ἩΙ (0x1f29399) → ἡι (0x1f213b9) → ἩΙ (0x1f29399)
ᾒ (0x1f92) → ἪΙ (0x1f2a399) → ἢι (0x1f223b9) → ἪΙ (0x1f2a399)
ᾓ (0x1f93) → ἫΙ (0x1f2b399) → ἣι (0x1f233b9) → ἫΙ (0x1f2b399)
ᾔ (0x1f94) → ἬΙ (0x1f2c399) → ἤι (0x1f243b9) → ἬΙ (0x1f2c399)
ᾕ (0x1f95) → ἭΙ (0x1f2d399) → ἥι (0x1f253b9) → ἭΙ (0x1f2d399)
ᾖ (0x1f96) → ἮΙ (0x1f2e399) → ἦι (0x1f263b9) → ἮΙ (0x1f2e399)
ᾗ (0x1f97) → ἯΙ (0x1f2f399) → ἧι (0x1f273b9) → ἯΙ (0x1f2f399)
ᾠ (0x1fa0) → ὨΙ (0x1f68399) → ὠι (0x1f603b9) → ὨΙ (0x1f68399)
ᾡ (0x1fa1) → ὩΙ (0x1f69399) → ὡι (0x1f613b9) → ὩΙ (0x1f69399)
ᾢ (0x1fa2) → ὪΙ (0x1f6a399) → ὢι (0x1f623b9) → ὪΙ (0x1f6a399)
ᾣ (0x1fa3) → ὫΙ (0x1f6b399) → ὣι (0x1f633b9) → ὫΙ (0x1f6b399)
ᾤ (0x1fa4) → ὬΙ (0x1f6c399) → ὤι (0x1f643b9) → ὬΙ (0x1f6c399)
ᾥ (0x1fa5) → ὭΙ (0x1f6d399) → ὥι (0x1f653b9) → ὭΙ (0x1f6d399)
ᾦ (0x1fa6) → ὮΙ (0x1f6e399) → ὦι (0x1f663b9) → ὮΙ (0x1f6e399)
ᾧ (0x1fa7) → ὯΙ (0x1f6f399) → ὧι (0x1f673b9) → ὯΙ (0x1f6f399)
ᾲ (0x1fb2) → ᾺΙ (0x1fba399) → ὰι (0x1f703b9) → ᾺΙ (0x1fba399)
ᾳ (0x1fb3) → ΑΙ (0x391399) → αι (0x3b13b9) → ΑΙ (0x391399)
ᾴ (0x1fb4) → ΆΙ (0x386399) → άι (0x3ac3b9) → ΆΙ (0x386399)
ᾶ (0x1fb6) → Α͂ (0x391342) → ᾶ (0x3b1342) → Α͂ (0x391342)
ᾷ (0x1fb7) → Α͂Ι (0x391342399) → ᾶι (0x3b13423b9) → Α͂Ι (0x391342399)
ι (0x1fbe) → Ι (0x399) → ι (0x3b9) → Ι (0x399)
ῂ (0x1fc2) → ῊΙ (0x1fca399) → ὴι (0x1f743b9) → ῊΙ (0x1fca399)
ῃ (0x1fc3) → ΗΙ (0x397399) → ηι (0x3b73b9) → ΗΙ (0x397399)
ῄ (0x1fc4) → ΉΙ (0x389399) → ήι (0x3ae3b9) → ΉΙ (0x389399)
ῆ (0x1fc6) → Η͂ (0x397342) → ῆ (0x3b7342) → Η͂ (0x397342)
ῇ (0x1fc7) → Η͂Ι (0x397342399) → ῆι (0x3b73423b9) → Η͂Ι (0x397342399)
ῒ (0x1fd2) → Ϊ̀ (0x399308300) → ῒ (0x3b9308300) → Ϊ̀ (0x399308300)
ΐ (0x1fd3) → Ϊ́ (0x399308301) → ΐ (0x3b9308301) → Ϊ́ (0x399308301)
ῖ (0x1fd6) → Ι͂ (0x399342) → ῖ (0x3b9342) → Ι͂ (0x399342)
ῗ (0x1fd7) → Ϊ͂ (0x399308342) → ῗ (0x3b9308342) → Ϊ͂ (0x399308342)
ῢ (0x1fe2) → Ϋ̀ (0x3a5308300) → ῢ (0x3c5308300) → Ϋ̀ (0x3a5308300)
ΰ (0x1fe3) → Ϋ́ (0x3a5308301) → ΰ (0x3c5308301) → Ϋ́ (0x3a5308301)
ῤ (0x1fe4) → Ρ̓ (0x3a1313) → ῤ (0x3c1313) → Ρ̓ (0x3a1313)
ῦ (0x1fe6) → Υ͂ (0x3a5342) → ῦ (0x3c5342) → Υ͂ (0x3a5342)
ῧ (0x1fe7) → Ϋ͂ (0x3a5308342) → ῧ (0x3c5308342) → Ϋ͂ (0x3a5308342)
ῲ (0x1ff2) → ῺΙ (0x1ffa399) → ὼι (0x1f7c3b9) → ῺΙ (0x1ffa399)
ῳ (0x1ff3) → ΩΙ (0x3a9399) → ωι (0x3c93b9) → ΩΙ (0x3a9399)
ῴ (0x1ff4) → ΏΙ (0x38f399) → ώι (0x3ce3b9) → ΏΙ (0x38f399)
ῶ (0x1ff6) → Ω͂ (0x3a9342) → ῶ (0x3c9342) → Ω͂ (0x3a9342)
ῷ (0x1ff7) → Ω͂Ι (0x3a9342399) → ῶι (0x3c93423b9) → Ω͂Ι (0x3a9342399)
Ω (0x2126) → ω (0x3c9) → Ω (0x3a9) → ω (0x3c9)
K (0x212a) → k (0x6b) → K (0x4b) → k (0x6b)
Å (0x212b) → å (0xe5) → Å (0xc5) → å (0xe5)
ff (0xfb00) → FF (0x4646) → ff (0x6666) → FF (0x4646)
fi (0xfb01) → FI (0x4649) → fi (0x6669) → FI (0x4649)
fl (0xfb02) → FL (0x464c) → fl (0x666c) → FL (0x464c)
ffi (0xfb03) → FFI (0x464649) → ffi (0x666669) → FFI (0x464649)
ffl (0xfb04) → FFL (0x46464c) → ffl (0x66666c) → FFL (0x46464c)
ſt (0xfb05) → ST (0x5354) → st (0x7374) → ST (0x5354)
st (0xfb06) → ST (0x5354) → st (0x7374) → ST (0x5354)
ﬓ (0xfb13) → ՄՆ (0x544546) → մն (0x574576) → ՄՆ (0x544546)
ﬔ (0xfb14) → ՄԵ (0x544535) → մե (0x574565) → ՄԵ (0x544535)
ﬕ (0xfb15) → ՄԻ (0x54453b) → մի (0x57456b) → ՄԻ (0x54453b)
ﬖ (0xfb16) → ՎՆ (0x54e546) → վն (0x57e576) → ՎՆ (0x54e546)
ﬗ (0xfb17) → ՄԽ (0x54453d) → մխ (0x57456d) → ՄԽ (0x54453d)

答案 2 :(得分:8)

我试过这个

v = lambda x: x.swapcase().swapcase() == x
[unichr(x) for x in range(10000) if not v(unichr(x))]

结果如下:

[u'\xb5', u'\u0130', u'\u0131', u'\u017f', u'\u03c2', u'\u03d0', u'\u03d1', u'\u03d5', u'\u03d6', u'\u03f0', u'\u03f1', u'\u03f4', u'\u03f5', u'\u1e9b', u'\u1e9e', u'\u1f80', u'\u1f81', u'\u1f82', u'\u1f83', u'\u1f84', u'\u1f85', u'\u1f86', u'\u1f87', u'\u1f90', u'\u1f91', u'\u1f92', u'\u1f93', u'\u1f94', u'\u1f95', u'\u1f96', u'\u1f97', u'\u1fa0', u'\u1fa1', u'\u1fa2', u'\u1fa3', u'\u1fa4', u'\u1fa5', u'\u1fa6', u'\u1fa7', u'\u1fb3', u'\u1fbe', u'\u1fc3', u'\u1ff3', u'\u2126', u'\u212a', u'\u212b']