python - how to correct the misencoded string? -

July 15, 2014

I used to read MP3 metadata because the ID3 tag is read as Unicode but in fact it is GBK Is encoded. How to correct this dragon?

  Audio = Ijiaidi 3 (name) title = Audio [ "title"] [0] print titles print repr (title)

  μ ± Äã¹Âμ yen A.A. »ÁÏëÆðË u '\ xb5 \ xb1 \ xc4 \ XE3 \ XB9 \ xC2 \ xb5 \ xa5 \ xc4 \ XE3 \ xbb \ xe1 \ XCF \ xeb \ xc6 \ xf0 \ XCB \ xad'

< P> But in fact this (sugar) should be in GBK.

  当 你 孤单 你 会 想起 谁

it seems that the string has been decoded in Unicode using the wrong encoding (Latin-1)) is.

You need to convert it to a byte string and then use the right encoding to decode it back to Unicode.

  title = u '\ xb5 \ xb1 \ xc4 \ XE3 \ XB9 \ xC2 \ xb5 \ xa5 \ xc4 \ XE3 \ xbb \ xe1 \ XCF \ xeb \ xc6 \ xf0 \ XCB \ xad 'Print title.encode (' Latin-1 '). Decode ( 'GBK') 当 你 孤单 你 会 想起 谁

Search This Blog

R LISR

python - how to correct the misencoded string? -

Comments

Post a Comment

Popular posts from this blog

sql - dynamically varied number of conditions in the 'where' statement using LINQ -

asp.net mvc - Dynamically Generated Ajax.BeginForm -

c++ - QtQuick: QQmlApplicationEngine failed to load component qrc:/main.qml:23 Invalid attached object assignment -