вторник, 5 марта 2019 г.

Linux Python parse dynamic js site

1) Install google-chrome on server
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" | tee /etc/apt/sources.list.d/google-chrome.list
apt update && apt -y install google-chrome-stable
google-chrome --version
Google Chrome 72.0.3626.121
2) Install chromedriver version for Google Chrome 72.0.3626.121 in my case
download and untar to /usr/local/bin/chromedriver

3) special parameters for avoiding errors

from selenium import webdriver

options = webdriver.ChromeOptions()

options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument("--no-sandbox")
options.add_argument("--disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")

browser = webdriver.Chrome('/usr/local/bin/chromedriver',chrome_options=options)

browser.get('http://<IP-address>:3000/home')
# ... other actions
generated_html = browser.page_source
browser.quit()
print(generated_html)

4) I need result 1,812,970

<div class="number">
    <h3>
        <small>#</small>
            <span data-counter="counterup" data-value="stats.blockHeight" class="ng-binding">1,812,970</span>
    </h3>
    <small>Block Height</small>
</div>

from bs4 import BeautifulSoup

soup = BeautifulSoup(generated_html, 'html.parser')

div_block = soup.body.find('div', attrs={'class': 'number'})

for span in div_block.find_all('span', recursive=True):
    print(span.get_text())

1,812,970

Комментариев нет:

Отправить комментарий