.net - Parsing Documents with a DSL -


I am trying to make a way to go through about one million documents which are formal documents (for arguments , They are thesis documents). They are not all standardized, but are quite close. They are titles, sections, paragraphs etc. There are such subtle differences which can be harvested in English, we call a title "title", but "titrates" in French.

In this way, the best way to do this in my mind is to create an ENB with all the possible combinations of the title: = Title. For example, the title

I am not very worried about coming with EBNF. My main concern is how to achieve parsing. I have seen ANTLR, OSLO, Ironically and many others, but they do not have the expertise in deciding that they will be perfect for my work.

Then, you

  1. Which DSL tool would you recommend to parse the documents on this scale?
  2. Which is the most accurate DSL tool in parsing, we have to define the rules of uppercase and lowercase, what about Roman numerals and the foreign language (French) about numbers.
  3. Is there a process / algorithm that I do not believe you have DSL? (Writing from scratch is an option, but I want to work quickly).
  4. Has anyone tried to parse through DSL (genetic algorithm and neural network) to learn algorithms and add intelligence?
  5. Will you use these DSL tools in the production environment?

My development platform of choice is C #. Ideally I would like to integrate the DSL device into code so that we can work with the current app.

"post-text" itemprop = "text">

I came across a device called, this is not exactly what I need, but to get the source code to see me It is also necessary to generate the need.


Comments

Popular posts from this blog

sql - dynamically varied number of conditions in the 'where' statement using LINQ -

asp.net mvc - Dynamically Generated Ajax.BeginForm -

Debug on symbian -