Python: convert RTF file to unicode? -

April 15, 2010

I am trying to convert lines into a series of Unicode strings in an RTF file, and then a regex Match lines (I need them Unicode so that I can output them to any other file.)

However, my ragex match is not working - I think because they are not being converted to Unicode.

This is my code:

  usefulLines = [] textData = {} # regex pattern (such as SUF 76,22) for an entry in DB: it is for us Suffix is enough to match three upper-case characters as well as a space entry. Pattern = '^ ([aged] {3}) [\ s]. * $ 'F = open (' textbase_1a.rtf ',' ur ') fileLines = f.readlines) # Get the milling line number, and save in useful lines for file (fileline): #line = line.decode ( 'Utf-16be') # This causes an error: I do not really know what file encoding is if R.T. In the file ... line = line.Secode ('Mac_Roman') is the print line, then re-matches (entropy, line): # Retrieve the following lines, all the way until we get blank lines print: + Str (i) Useful Lins .append (i)

For the moment, it prints all lines, but does not print anything with the match - though it should match. In addition, For some reasons the lines are being printed with '/ cross' when I try to print them in any output file, so they look very strange.

Part of the problem is that I do not know what encoding is to specify.

If I use entryPattern = '^. * $' I get.

Can someone help?

/ P >

You have not decoded the RTF file. No are just simple text files. For deletion, a file containing "äöü" contains:

{\ rtf1 \ ansi \ ansicpg1252 \ deff0 \ deflang1031 {\ fonttbl {\ f0 \ fswiss \ fcharset0 ariel;}} < / P>
{* * Generator Msftedit 5.41.15.1507;} \ Viewkind4 \ uc1 \ pard \ f0 \ fs20 \ 'e4 \ f6 \ fc \ par

}

The letter "äöü" has been encoded as a window when opened in a text editor - 1252 as declared at the beginning of the file (äöü = 0xE4 0xF6 0x FC).

To read the RTF you will first need to convert the RTF to text already).

Search This Blog

R LISR

Python: convert RTF file to unicode? -

Comments

Post a Comment

Popular posts from this blog

sql - dynamically varied number of conditions in the 'where' statement using LINQ -

asp.net mvc - Dynamically Generated Ajax.BeginForm -

Debug on symbian -