Html requested by python is not identical with this in browser

by: chatzich, 8 years ago


Hello I have a problem retrieving html from a site. There is the code I use but unfortunatelly the html printed is not the html displayed by my browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    obj = requests.get(url, headers = headers)
    print(obj.text)
Can I deceive the site's server to give me the right html?



You must be logged in to post. Please login or register an account.



It depends, really. What is the HTML being returned to you? What's different?

Is the HTML missing javascript, or are you being served a unique page since you're being a detected bot?

If it's just an issue of not loading javascript (tables not being updated...many times even text...etc), then look into: https://pythonprogramming.net/javascript-dynamic-scraping-parsing-beautiful-soup-tutorial/

-Harrison 8 years ago

You must be logged in to post. Please login or register an account.


Thank you for your help Harrison I saw the tutorial and I managed to retrieve the right html  I have a propose for Qt code

class Client(QWebPage):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.html = "";
        self.loadFinished = False


    def getUrl(self,url):
        self.mainFrame().load(QUrl(url))
        self.loadFinished = True
        while(self.loadFinished): {
            self.app.processEvents()
        }


    def on_page_load(self):
        self.loadFinished = False;

browser = Client()
browser.getUrl("https://www.pythonprogramming.net")
source = browser.mainFrame().toHtml()


P.S. How can I find you on LinkedIn?

-chatzich 8 years ago
Last edited 8 years ago

You must be logged in to post. Please login or register an account.