如何在PHP中使用谷歌 Gemini 大模型推理识别爬虫验证码

简介

当爬虫遇到图片验证码时，确实会给自动化爬取数据带来一定的挑战。图片验证码是为了防止自动化工具（如爬虫）过度访问或滥用网站资源而设置的一种安全机制。它要求用户识别并输入图片中显示的字符或进行特定的操作，以证明访问者是真实的人类用户。

Gemini是一款由Google DeepMind（谷歌母公司Alphabet下设立的人工智能实验室）于2023年12月6日发布的人工智能模型，可同时识别文本、图像、音频、视频和代码五种类型信息，还可以理解并生成主流编程语言（如PHP、Python、Java、C++）的高质量代码，并拥有全面的安全性评估。

创建项目

composer create-project workerman/webman webman20240312

安装依赖

composer require google-gemini-php/client

Gemini PHP 是一个社区维护的PHP API客户端，允许您与Gemini AI API进行交互。该扩展需要PHP 8.1+

项目地址：https://github.com/google-gemini-php/client

如果您的项目尚未集成PSR-18客户端，请确保允许 php-http/discovery composer 插件手动运行或安装客户端。

composer require guzzlehttp/guzzle

文本模型测试

<?php
/**
 * @desc 在PHP中使用谷歌 Gemini 大模型推理识别验证码
 * @author Tinywan(ShaoBo Wan)
 * @email 756684177@qq.com
 * @date 2024/3/13 23:13
 */
declare(strict_types=1);

require_once '../vendor/autoload.php';

$apiKey = 'AIzaSyAPxxxxxxxxxxxxxxx_uEpw';
$client = Gemini::client($apiKey);
$result = $client->geminiPro()->generateContent('PHP语言是什么？');

echo $result->text() . PHP_EOL;

输出

PHP（全称：PHP：Hypertext Preprocessor，超文本预处理器）是一种通用高阶脚本语言，尤其适用于 Web 开发。

**特点：**

* **开源和免费：**PHP 是免费且开源的，任何人都可以使用或修改它。
* **跨平台：**PHP 可以在 Windows、Linux、Unix 等各种平台上运行。
* **易于学习：**PHP 具有简单易懂的语法，即使是初学者也能轻松上手。
* **广泛应用：**PHP 是 Web 开发中最流行的语言之一，被 WordPress、Facebook 和 Wikipedia 等主要网站使用。
* **模块化：**PHP 提供大量的模块，使开发者可以轻松扩展其功能。
* **动态类型：**PHP 使用动态类型系统，可以在运行时分配变量类型。

图片模型测试

获取验证码图片原始文本

验证码：captcha01.jpg

参考代码

<?php
/**
 * @desc 在PHP中使用谷歌 Gemini 大模型推理识别验证码
 * @author Tinywan(ShaoBo Wan)
 * @email 756684177@qq.com
 * @date 2024/3/13 23:13
 */
declare(strict_types=1);

require_once '../vendor/autoload.php';

use PsrHttpMessageRequestInterface;
use PsrHttpMessageResponseInterface;

$apiKey = 'AIzaSyAPLiuNxxxxxxxxxxxxxxx_uEpw';
$client = Gemini::factory()
    ->withApiKey($apiKey)
    ->withBaseUrl('https://gemini.ailard.com/v1/')
    ->withHttpClient($client = new GuzzleHttpClient([]))
    ->withStreamHandler(fn(RequestInterface $request): ResponseInterface => $client->send($request, [
        'stream' => true // Allows to provide a custom stream handler for the http client.
    ]))
    ->make();

$result = $client
    ->geminiProVision()
    ->generateContent([
        'I will provide you with an image CAPTCHA, please recognize the content inside the CAPTCHA and output the text',
        new GeminiDataBlob(
            mimeType: GeminiEnumsMimeType::IMAGE_JPEG,
            data: base64_encode(
                file_get_contents('./captcha01.jpg')
            )
        )
    ]);

echo $result->text() . PHP_EOL;

识别输出结果

The content inside the CAPTCHA is "AXBV".

获取验证码图片计算结果

验证码：captcha02.png

参考代码

<?php
/**
 * @desc 获取验证码图片计算结果
 * @author Tinywan(ShaoBo Wan)
 * @email 756684177@qq.com
 * @date 2024/3/13 23:13
 */
declare(strict_types=1);

require_once '../vendor/autoload.php';

use PsrHttpMessageRequestInterface;
use PsrHttpMessageResponseInterface;

$apiKey = 'AIzaSyAPLiuNxxxxxxxxxxxxxxx_uEpw';
$client = Gemini::factory()
    ->withApiKey($apiKey)
    ->withBaseUrl('https://gemini.ailard.com/v1/')
    ->withHttpClient($client = new GuzzleHttpClient([]))
    ->withStreamHandler(fn(RequestInterface $request): ResponseInterface => $client->send($request, [
        'stream' => true // Allows to provide a custom stream handler for the http client.
    ]))
    ->make();

$result = $client
    ->geminiProVision()
    ->generateContent([
        'I will provide you with an image CAPTCHA, please recognize the content inside the CAPTCHA and output the text',
        new GeminiDataBlob(
            mimeType: GeminiEnumsMimeType::IMAGE_PNG,
            data: base64_encode(
                file_get_contents('./captcha02.png')
            )
        )
    ]);

echo $result->text() . PHP_EOL;