有关calamine库，昨天私信被问爆了！今天咱们加个餐

葫芦侠五楼 • 2024年3月26日下午8:46 • Python • 阅读 1217

昨天文章发出后，好多读者都来私信问布布有关calamine库的实操，咱们今天来加个餐，手把手跑一遍！

简单再介绍一下calamine库，咱在github 上查看，会看到其实它是一个 rust 的库，同时支持一众 excel 格式( xls , xlsx , xlsm , xlsb , xla , xlam )，并且增加python-calamine包，但目前其不支持写Excel文件。

接下来要说下安装，既然已经增加python-calamine包，我们可以采用以下方式进行安装:

pip install python-calamine -i https://pypi.tuna.tsinghua.edu.cn/simple/

我们尝试跑第一个例子

from python_calamine import CalamineWorkbook
workbook = CalamineWorkbook.from_path("file.xlsx")
print(workbook.sheet_names)
print(workbook.get_sheet_by_name("Sheet1").to_python())

成功读取！

下面我们来玩一个大的

我准备了50W X 12列的数据（.xlsx）

数据如下图所示

这么多数据的读取居然只花了7秒左右，震惊了！整体读取时间如下图所示

有兴趣的读者，可以参考这段代码跑一下，具体代码请根据实际情况自行完善

import os
import time
from typing import Iterator, IO

import python_calamine

def iter_excel_calamine(file: IO[bytes]) -> Iterator[dict[str, object]]:
    workbook = python_calamine.CalamineWorkbook.from_filelike(file)  # type: ignore[arg-type]
    rows = iter(workbook.get_sheet_by_index(0).to_python())
    headers = list(map(str, next(rows)))
    for row in rows:
        yield dict(zip(headers, row))

t1 = time.time()

with open('./数据源/'+os.listdir('./数据源/')[0],'rb') as f:
    rows = iter_excel_calamine(f)
    
    for r in rows:
        print(r.keys(),r.values())
        # row = next(rows)
        # print(row)
t2 = time.time()
print(t2-t1)