Pyppeteer是一个Python库,它提供了一组API来控制无头版本的Chromium浏览器。无头浏览器是一种没有图形用户界面(GUI)的Web浏览器,可以在后台运行,非常适合自动化测试、网络爬虫、自动化任务等场景。
「安装」
pip install pyppeteer
1.「获取百度搜索结果」
import asyncio
from pyppeteer import launch
async def main(wd):
browser = await launch()
page = await browser.newPage()
await page.goto('https://www.baidu.com')
await page.type('input[name="wd"]', wd)
await page.keyboard.press('Enter')#模拟点击Enter
await asyncio.sleep(5)
search_results = await page.evaluate('''() => {
const results = [];
const links = document.querySelectorAll('#content_left h3 a');
links.forEach(link => {
results.push({
title: link.innerText,
url: link.getAttribute('href')
});
});
return results;
}''')
for result in search_results:
print(result['title'],"n"+ result['url'])
await browser.close()
#搜索
asyncio.get_event_loop().run_until_complete(main("python"))#关键词
2.「捕获网页截图」
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://baidu.com')
await page.screenshot({'path': 'example.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
3.「生成网页PDF」
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.pdf({'path': 'example.pdf', 'format': 'A4'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
4.「处理JavaScript密集型页面」
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com/js-heavy-page')
# 等待某个元素加载完成
await page.waitForSelector('#loaded-element')
# 执行JavaScript代码
data = await page.evaluate('''() => {
const data = document.querySelector('#loaded-element').textContent;
return data;
}''')
print(data)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
5.「网络请求拦截」
import asyncio
from pyppeteer import launch
async def handle_request(client, request, response):
if request.url.startswith('https://example.com'):
# 修改请求头
request.headers['X-Custom-Header'] = 'MyValue'
response = await client.fetch(request, response)
return response
async def main():
browser = await launch()
page = await browser.newPage()
await page.setRequestInterception(True)
page.on('request', handle_request)
await page.goto('https://example.com')
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
原文始发于微信公众号(python学霸):这才是Python模拟浏览器天花板,Selenium顶多算是弟弟!
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/289951.html