Screen scraping with Python -


Is there a screen scraping library in Python that offers JavaScript support?

I need JavaScript support for Pycurl , and Java HtmlUnit .

Ideally I want to be able to do everything from Python, but I have not come into any libraries which will allow me to do this. Do they exist?

There are several options while working with static HTML, which cover other responses. However, if you need JavaScript support and want to stay in Python then I recommend render webpage (including javascript) and as a result the HTML is checking. For example:

  import signal imported from PyQt4.QtCore by PyQt4.QtGui import * QWebPage class render (QWebPage) from PyQt4.QtWebKit import: def __init __ (auto, url): self .app = QApplication (sys.argv) QWebPage .__ init __ (self) self.html = no sign. Signal (signal sign, signal SIG_DFL) self.connect (self, signal ('loadFinished (bool)', auto_finished_loading) self.mainframe (). Load (url) self.app.exec_ () def _finished_loading (auto, result): self.html = self.mainFrame (). ToHtml () self.app.quit () if __name__ == '__main__': try: url = sys.argv [1] except index: print 'Usage:% s url'% sys.argv [0] and: javascript_html = Render (URL) .html  

Comments

Popular posts from this blog

sql - dynamically varied number of conditions in the 'where' statement using LINQ -

asp.net mvc - Dynamically Generated Ajax.BeginForm -

Debug on symbian -