Java File parsing toolkit design, quick file encoding sanity check -
(Disclaimer: I've seen many posts here before asking, I found it particularly useful, I only have one
Hi all,
I have an internal Java product that I have created to load the data files in the database (aka A et al. Equipment). I have XSLT changes For the pre-rolled phase, and working as a paradigm within the original file. Can be format, they can be flat data files or XML data files, you specifically configure the steps to load datafeed.
I have not yet noticed the issue of the file I have been encoding (a mistake I know), because all were working fine (in the main) However, I am now coming against file encoding issues, to reduce a long story, the way the A Can be configured with, due to the nature of, I have to find the file encoding of the input file and create a Java Reader object. I want to do a quick discretion with people before plunge into just a few people I can not claim to fully understand:
- Adopt standard file encoding of UTF-16
- Use input file encoding or to smell
- Use Apache Commons IO Library for creating a standard reader and author for all stages (am I thinking that there is no encoding-sniffing API in it?)
Do you have any disadvantages / knowledge to provide any extra in my underlined perspective?
Either way, I can rely on back compatibility with any data using its current approach to running Java Runtime, to fix the encoding of windows-1252?
--James
Option 1 followed me And compatibility (definitely for long lasting), though "the right way" (the option in the right way breaks the compatibility normally backwards), perhaps with the additional thought of whether the UTF-8 is a Good choice will be there.
As appropriate if you have a known set of limited, known encodings that you have known that your sniper has correctly identified and identified.
Another option here is to use meta-data, and there is a strong alternative), which tells your code that the data was provided according to UTF-16 standard and behave accordingly, otherwise Standard it before moving ahead to UTF-16.
Comments
Post a Comment