Python thinks a 3000-line text file is one line long? -
I have a very long text file that I am trying to use Python.
However, the following code:
for the line in the open ('textbase.txt', 'r'): print 'hello world'
outputs only the following:
Hello World
It seems as if Python thinks that the file is only one line It is long, though it is several thousand lines long, when it is seen that a text editor gives it by checking the file command on the command line:
$ File textbase.txt Textbase.txt: Big-endian UTF-16 Unicode English Text, line terminator with CR
Is there something wrong? Do I need to change line terminator? According to
, you should add a in y
mode:
open ('textbase.txt', 'oo')
Enables this "" which makes them normal \ n
It gives you in the stars.
However, doing the right thing is decoding UTF-16 BE in unicode objects in first translating newlines otherwise, a chance 0x0d
byte mistake Can result in a 0x0a
, resulting in
Unicodecode error: 'utf16' codec can not be decode byte 0x0a in position 12: minimized data.
provides a open
function of Python that can decode Unicode and handle Newline at the same time:
if the file is a byte The order mark (BOM) and you specify 'utf-16'
, then it detects endnines and hides the BOM for you. If it does not happen (since BOM is optional), then the decoder will move forward and use the endianness of your system, which will probably not be good.
Endnearnese ( Utf-16be '
) will hide BOM, so that you want to use this hack:
import codecs firstline = codecs Open (for line in 'textbase.txt'), 'ur', 'utf-16b'): if the first row: first line = wrong line = line.listip (you 'uff')
Also see:
Comments
Post a Comment