中国春运!Python爬取中国近3年来的旅客运输量

一、爬取目标

中国春运!Python爬取中国近3年来的旅客运输量

二、完整源码

#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
#
# Copyright (c) 2024 愤怒的it, All Rights Reserved.
# FileName : code.py
# Date : 2024.01.29
# Author : 愤怒的it
# Version : 1.0.0
# Node : 欢迎关注微信公众号【愤怒的it男】
#
#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

from lxml import etree
import csv
from prettytable import PrettyTable
from playwright.sync_api import sync_playwright

def getData():
    with sync_playwright() as p:
        browser_type = p.chromium
        browser = browser_type.launch()
        page = browser.new_page(ignore_https_errors=True)
        page.goto('https://data.stats.gov.cn/easyquery.htm')
        page.get_by_text("交通运输").click()
        page.get_by_text("旅客运输量").click()
        page.locator("//div[@class='dtHead']").click()
        page.get_by_text("最近36个月").click()
        html = etree.HTML(page.content())
        thead = html.xpath("//table[@class='public_table table_column']/thead/tr/th/strong/text()")
        tbody = []
        trs = html.xpath("//table[@class='public_table table_column']/tbody/tr")
        for tr in trs:
            row = []
            index1 = tr.xpath("td[1]/a/following::text()[1]")[0]
            index2 = tr.xpath("td[1]/a/following::text()[2]")[0]
            index = str(index1) + '|' + str(index2)
            row.append(index)
            row = row + tr.xpath("td[@align='right']/text()")
            tbody.append(row)
        browser.close()
    return (thead, tbody)
   
def printData(thead, tbody):
    table = PrettyTable()
    table.add_column("时间",thead[1:])
    for index,row in enumerate(tbody):
        if index in [0,4,8,12,16]:
            table.add_column(row[0],row[1:])
    print(table)
    
    #table.add_column("时间",thead[1:])

def saveData(thead, tbody):
    with open('旅客运输量.csv''w', encoding='utf-8', newline=''as file:
        writer = csv.writer(file)
        writer.writerow(thead)
        writer.writerows(tbody)

def main():
    print('数据采集中,请耐心等待……')
    result = getData()
    printData(result[0], result[1])
    saveData(result[0], result[1])
    
if __name__== "__main__" :
    main()

三、爬取结果

+------------+---------------------+-------------------------+-------------------------+-------------------------+-------------------------+
| 时间 | 客运量|当期值(万人) | 铁路客运量|当期值(万人) | 公路客运量|当期值(万人) | 水运客运量|当期值(万人) | 民航客运量|当期值(万人) |
+------------+---------------------+-------------------------+-------------------------+-------------------------+-------------------------+
202312 |        73965        |          28986          |          38640          |           1280          |           5059          |
202311 |        74495        |          27974          |          39967          |           1655          |           4899          |
202310 |        85182        |          35169          |          41888          |           2520          |           5605          |
20239 |        79730        |          32499          |          39980          |           1902          |           5349          |
20238 |        94474        |          42860          |          41625          |           3593          |           6396          |
20237 |        90993        |          40919          |          40798          |           3034          |           6243          |
20236 |        79073        |          32682          |          39041          |           2039          |           5312          |
20235 |        79323        |          32560          |          39141          |           2453          |           5170          |
20234 |        78400        |          32900          |          38300          |           2200          |           5000          |
20233 |        70595        |          28038          |          36127          |           1861          |           4569          |
20232 |        65872        |          27297          |          32551          |           1704          |           4320          |
20231 |        58327        |          23609          |          29187          |           1554          |           3977          |
202212 |        36123        |           9913          |          23771          |           568           |           1871          |
202211 |        33481        |           7374          |          24146          |           704           |           1257          |
202210 |        42794        |          11896          |          28147          |           1159          |           1592          |
20229 |        48413        |          14752          |          30700          |           950           |           2010          |
20228 |        59623        |          21732          |          32890          |           1770          |           3230          |
20227 |        61900        |          22615          |          34184          |           1701          |           3400          |
20226 |        51605        |          16736          |          31637          |           1032          |           2200          |
20225 |        37782        |           9225          |          26686          |           664           |           1207          |
20224 |        32506        |           5691          |          25595          |           432           |           788           |
20223 |        42108        |          10145          |          29759          |           667           |           1537          |
20222 |        56582        |          19052          |          33328          |           1073          |           3129          |
20221 |        55534        |          17874          |          33801          |           908           |           2951          |
202112 |        57633        |          16498          |          37587          |           839           |           2710          |
202111 |        54039        |          14396          |          36521          |           969           |           2153          |
202110 |        72751        |          24767          |          42429          |           1669          |           3886          |
20219 |        67930        |          22711          |          40505          |           1102          |           3612          |
20218 |        54904        |          15409          |          36243          |           1012          |           2241          |
20217 |        83398        |          30847          |          45695          |           1944          |           4912          |
20216 |        75138        |          24607          |          44895          |           1507          |           4128          |
20215 |        84868        |          29415          |          48202          |           2146          |           5104          |
20214 |        80639        |          28110          |          45605          |           1814          |           5109          |
20213 |        74544        |          24376          |          44087          |           1299          |           4782          |
20212 |        57293        |          14318          |          39397          |           1184          |           2395          |
20211 |        67112        |          15715          |          47527          |           852           |           3018          |
+------------+---------------------+-------------------------+-------------------------+-------------------------+-------------------------+

中国春运!Python爬取中国近3年来的旅客运输量

中国春运!Python爬取中国近3年来的旅客运输量

四、免责声明

本文所涉及源代码均为个人学习所编写,并于公众号(愤怒的it男)中原创发布,仅可用于计算机技术学习及研究等合法行为,禁止利用本文所涉及源代码从事任何违反本国(地区)法律法规的业务,如有发现存在违法违规行为我会举报到网监部门。

原文始发于微信公众号(愤怒的it男):中国春运!Python爬取中国近3年来的旅客运输量

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/234395.html

(0)
葫芦侠五楼的头像葫芦侠五楼

相关推荐

发表回复

登录后才能评论
极客之音——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!