python - how to correct the misencoded string? -
I used to read MP3 metadata because the ID3 tag is read as Unicode but in fact it is GBK Is encoded. How to correct this dragon?
Audio = Ijiaidi 3 (name) title = Audio [ "title"] [0] print titles print repr (title)
μ ± Äã¹Âμ yen A.A. »ÁÏëÆðË u '\ xb5 \ xb1 \ xc4 \ XE3 \ XB9 \ xC2 \ xb5 \ xa5 \ xc4 \ XE3 \ xbb \ xe1 \ XCF \ xeb \ xc6 \ xf0 \ XCB \ xad'
< P> But in fact this (sugar) should be in GBK. 当 你 孤单 你 会 想起 谁
it seems that the string has been decoded in Unicode using the wrong encoding (Latin-1)) is.
You need to convert it to a byte string and then use the right encoding to decode it back to Unicode.
title = u '\ xb5 \ xb1 \ xc4 \ XE3 \ XB9 \ xC2 \ xb5 \ xa5 \ xc4 \ XE3 \ xbb \ xe1 \ XCF \ xeb \ xc6 \ xf0 \ XCB \ xad 'Print title.encode (' Latin-1 '). Decode ( 'GBK') 当 你 孤单 你 会 想起 谁
Comments
Post a Comment