ChatGPT|Prompt中的CoT和ReAct

我们在写Prompt可能经常遇到不准确或者无法获得外部知识，因此本文讲述CoT和ReAct如何运作并提高大语言模型准确性。

第一部分：CoT（思维链）

1、什么是CoT

通用的Prompt：

问题：Roger有5个网球。他买了2罐网球。每罐有3个网球。他现在有多少个网球？ 
答案：答案是11个。 
问题：自助餐厅有23个苹果。如果他们用了20个来做午餐，又买了6个，他们现在有多少个苹果？

# 输出
答案：答案是29个。

CoT的Prompt：

问题：Roger有5个网球。他买了2罐网球。每罐有3个网球。他现在有多少个网球？ 
答案：Roger一开始有5个网球。2罐每罐3个网球，共6个网球。5 + 6 = 11。答案是11个。 
问题：自助餐厅有23个苹果。如果他们用了20个来做午餐，又买了6个，他们现在有多少个苹果？

# 输出
自助餐厅有23个苹果。
他们用了20个来做午餐。
又买了6个。
他们现在有多少个苹果？
23 - 20 + 6 = 9
答案是9个。

（通过文心一言提问）

CoT是通过让GPT参考Prompt中的中间推理步骤，然后询问类似的问题，将GPT的推理透明化，从而获得比原始提问更高的准确率，从上面两个Prompt对比可以看出，CoT的Prompt才是正确答案，通用的Prompt回答是错误的。
Jason Wei, Xuezhi Wang等在 https://arxiv.org/abs/2201.11903 论文中提出CoT的方法，通过中间推理步骤实现了复杂的推理能力，并给出了一些试验数据：

CoT对比数据

2、什么场景下能使用CoT

CoT对小模型作用不大，模型参数至少达到10B才有效果，达到100B效果才明显，所以对于私有化训练并且是大模型情况下，可以尝试提供推理步骤让其训练
CoT对复杂的问题收益较高，简单问题可能对大模型已经容易推理，复杂问题更好体现价值
CoT有一定的局限性，给出的推理路径不一定是完全正确的，需要提问者自己辨别

3、开发一个简单的AutoCoT

对于CoT有个开源项目（https://github.com/amazon-science/auto-cot），是基于如何让GPT3更加聪明（非GPT3.5），大家有兴趣的可以看一下源码，我把这个项目改为中文了，并跑了一下demo，输出如下：

========== manual_cot
2023/09/26 22:33:22
*****************************
Test Question:
一名服务员有14个客人要服务。如果有3个客人离开了，他又得到了另外39个客人，那么他现在总共有多少个客人
*****************************
Prompted Input:
问题: 林地里有15棵树。林地工人今天将在林地里种树。完成后，林地里将有21棵树。林地工人今天种了多少棵树？
答案: 原本有15棵树。然后种了一些树后，变成了21棵。所以一定种了21-15=6棵树。 这个答案是 6.
问题: 如果停车场里有3辆车，又有2辆车到达，那么停车场里有多少辆车？
答案: 原本有3辆车。又到来了2辆车。3+2=5辆车。 这个答案是 5.
Q: 一名服务员有14个客人要服务。如果有3个客人离开了，他又得到了另外39个客人，那么他现在总共有多少个客人
A:
*****************************
Output:
原本有14个客人。又来了39个客人。14+39=53个客人。 这个答案是 53.
*****************************
========== auto_cot
2023/09/26 22:33:26
*****************************
Test Question:
一名服务员有14个客人要服务。如果有3个客人离开了，他又得到了另外39个客人，那么他现在总共有多少个客人
*****************************
Prompted Input:
问题: Wendy在Facebook上传了45张照片。她将27张照片放在一个相册中，将其余的照片放在了9个不同的相册中。每个相册有多少张照片？
答案: 让我们逐步思考。首先，我们知道Wendy总共上传了45张照片。其次，我们知道Wendy将27张照片放在一个相册中。这意味着Wendy将剩余的18张照片放在了9个不同的相册中。这意味着每个相册都有2张照片。 这个答案是 2.
问题: 在万圣节，Katie和她的妹妹合并了他们收到的糖果。Katie有8块糖果，而她的妹妹有23块糖果。如果他们在第一天晚上吃了8块糖果，那么他们还剩下多少块糖果？
答案: 让我们逐步思考。Katie和她的妹妹一共有8+23=31块糖果。如果他们在第一天晚上吃了8块糖果，那么他们还剩下23块糖果。 这个答案是 23.
Q: 一名服务员有14个客人要服务。如果有3个客人离开了，他又得到了另外39个客人，那么他现在总共有多少个客人
A: 让我们一步一步思考.
*****************************
Output:
他最初有14个客人. 然后3个客人离开了, 所以他只剩下11个客人. 然后他又得到了39个客人, 所以他现在总共有50个客人. 这个答案是 50.
*****************************

具体代码，如下：

# -*- coding: utf-8 -*-

import argparse
import time
import json
import datetime
import requests

TOKEN = "{你的TOKEN}"

multiarith_auto_json = {
    "demo": [
        {
            "question": "问题: Wendy在Facebook上传了45张照片。她将27张照片放在一个相册中，将其余的照片放在了9个不同的相册中。每个相册有多少张照片？n答案:",
            "rationale": "让我们逐步思考。首先，我们知道Wendy总共上传了45张照片。其次，我们知道Wendy将27张照片放在一个相册中。这意味着Wendy将剩余的18张照片放在了9个不同的相册中。这意味着每个相册都有2张照片。",
            "pred_ans": "2",
            "gold_ans": "2"
        },
        {
            "question": "问题: 在万圣节，Katie和她的妹妹合并了他们收到的糖果。Katie有8块糖果，而她的妹妹有23块糖果。如果他们在第一天晚上吃了8块糖果，那么他们还剩下多少块糖果？n答案:",
            "rationale": "让我们逐步思考。Katie和她的妹妹一共有8+23=31块糖果。如果他们在第一天晚上吃了8块糖果，那么他们还剩下23块糖果。",
            "pred_ans": "23",
            "gold_ans": "23"
        }
    ]
}

multiarith_manual_json = {
    "demo": [
        {
            "question": "问题: 林地里有15棵树。林地工人今天将在林地里种树。完成后，林地里将有21棵树。林地工人今天种了多少棵树？n答案:",
            "rationale": "原本有15棵树。然后种了一些树后，变成了21棵。所以一定种了21-15=6棵树。",
            "pred_ans": "6"
        },
        {
            "question": "问题: 如果停车场里有3辆车，又有2辆车到达，那么停车场里有多少辆车？n答案:",
            "rationale": "原本有3辆车。又到来了2辆车。3+2=5辆车。",
            "pred_ans": "5"
        }
    ]
}


def request_api(prompt, engine, max_tokens, temperature, stop, top_p, frequency_penalty, presence_penalty):
    url = "https://{你的代理地址}/v1/completions"
    headers = {
        "Content-Type": "application/json",
        "QPilot-ID": "246",
        "Authorization": "Bearer " + TOKEN,
    }
    params = {
        "model": engine,
        "prompt": prompt,
        "temperature": temperature,
        "max_tokens": max_tokens,
        "stop": stop,
        "top_p": top_p,
        "frequency_penalty": frequency_penalty,
        "presence_penalty": presence_penalty,
    }
    result = requests.post(url, data=json.dumps(params),
                           headers=headers)
    # print("result: ", str(result.text))
    return result.json()


def decoder_for_gpt3(args, input, max_length):
    # GPT-3 API allows each users execute the API within 60 times in a minute ...
    # time.sleep(1)
    time.sleep(args.api_time_interval)
    # https://beta.openai.com/account/api-keys
    # openai.api_key = "[Your OpenAI API Key]"

    # Specify engine ...
    # Instruct GPT3
    if args.model == "gpt3":
        engine = "text-ada-001"
    elif args.model == "gpt3-medium":
        engine = "text-babbage-001"
    elif args.model == "gpt3-large":
        engine = "text-curie-001"
    elif args.model == "gpt3-xl":
        engine = "text-davinci-002"
    elif args.model == "text-davinci-001":
        engine = "text-davinci-001"
    elif args.model == "code-davinci-002":
        engine = "code-davinci-002"
    else:
        raise ValueError("model is not properly defined ...")

    if ("few_shot" in args.method or "auto" in args.method) and engine == "code-davinci-002":
        response = request_api(
            engine=engine,
            prompt=input,
            max_tokens=max_length,
            temperature=args.temperature,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0,
            stop=["n"]
        )
    else:
        response = request_api(
            engine=engine,
            prompt=input,
            max_tokens=max_length,
            temperature=args.temperature,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0,
            stop=None
        )
    return response["choices"][0]["text"]


def create_demo_text(args, cot_flag):
    x, z, y = [], [], []
    json_data = args.demo_content["demo"]
    for line in json_data:
        x.append(line["question"])
        z.append(line["rationale"])
        y.append(line["pred_ans"])

    index_list = list(range(len(x)))

    demo_text = ""
    for i in index_list:
        if cot_flag:
            demo_text += x[i] + " " + z[i] + " " + 
                args.direct_answer_trigger_for_fewshot + " " + y[i] + ".nn"
        else:
            demo_text += x[i] + " " + 
                args.direct_answer_trigger_for_fewshot + " " + y[i] + ".nn"
    return demo_text


class Decoder():
    def __init__(self):
        pass

    def decode(self, args, input, max_length):
        response = decoder_for_gpt3(args, input, max_length)
        return response


def cot(method, question):
    args = parse_arguments()
    decoder = Decoder()

    args.method = method
    if args.method != "zero_shot_cot":
        if args.method == "auto_cot":
            args.demo_content = multiarith_auto_json
        else:
            args.demo_content = multiarith_manual_json
        demo = create_demo_text(args, cot_flag=True)
    else:
        demo = None

    x = "Q: " + question + "n" + "A:"
    print('*****************************')
    print("Test Question:")
    print(question)
    print('*****************************')

    if args.method == "zero_shot":
        x = x + " " + args.direct_answer_trigger_for_zeroshot
    elif args.method == "zero_shot_cot":
        x = x + " " + args.cot_trigger
    elif args.method == "manual_cot":
        x = demo + x
    elif args.method == "auto_cot":
        x = demo + x + " " + args.cot_trigger
    else:
        raise ValueError("method is not properly defined ...")

    print("Prompted Input:")
    print(x.replace("nn", "n").strip())
    print('*****************************')

    max_length = args.max_length_cot if "cot" in args.method else args.max_length_direct
    z = decoder.decode(args, x, max_length)
    z = z.replace("nn", "n").replace("n", "").strip()
    if args.method == "zero_shot_cot":
        z2 = x + z + " " + args.direct_answer_trigger_for_zeroshot_cot
        max_length = args.max_length_direct
        pred = decoder.decode(args, z2, max_length)
        print("Output:")
        print(z + " " + args.direct_answer_trigger_for_zeroshot_cot + " " + pred)
        print('*****************************')
    else:
        pred = z
        print("Output:")
        print(pred)
        print('*****************************')


def parse_arguments():
    parser = argparse.ArgumentParser(description="Zero-shot-CoT")

    parser.add_argument("--max_num_worker", type=int, default=0,
                        help="maximum number of workers for dataloader")
    parser.add_argument(
        "--model", type=str, default="gpt3-xl", help="model used for decoding. Note that 'gpt3' are the smallest models."
    )
    parser.add_argument(
        "--method", type=str, default="auto_cot", choices=["zero_shot", "zero_shot_cot", "few_shot", "few_shot_cot", "auto_cot"], help="method"
    )
    parser.add_argument(
        "--cot_trigger_no", type=int, default=1, help="A trigger sentence that elicits a model to execute chain of thought"
    )
    parser.add_argument(
        "--max_length_cot", type=int, default=256, help="maximum length of output tokens by model for reasoning extraction"
    )
    parser.add_argument(
        "--max_length_direct", type=int, default=32, help="maximum length of output tokens by model for answer extraction"
    )
    parser.add_argument(
        "--api_time_interval", type=float, default=1.0, help=""
    )
    parser.add_argument(
        "--temperature", type=float, default=0, help=""
    )
    args = parser.parse_args()
    args.direct_answer_trigger_for_fewshot = "这个答案是"
    args.direct_answer_trigger_for_zeroshot = "这个答案是"
    args.direct_answer_trigger_for_zeroshot_cot = "这个答案是"
    args.cot_trigger = "让我们一步一步思考."
    return args


if __name__ == "__main__":
    ans = "一名服务员有14个客人要服务。如果有3个客人离开了，他又得到了另外39个客人，那么他现在总共有多少个客人"
    print("========== manual_cot")
    cot("manual_cot", ans)
    print("========== auto_cot")
    cot("auto_cot", ans)

第二部分：ReAct（推理+行动）

1、什么是ReAct

在了解ReAct之前，先来一个示例：

问题 科罗拉多造山带东部区域延伸到的区域的海拔范围是多少?
思考 1 我需要搜索科罗拉多造山带，找到科罗拉多造山带东段延伸的区域，然后找到该区域的海拔范围。
操作 1 搜索 [科罗拉多造山带]
观察 1 科罗拉多造山带是科罗拉多及其周边地区造山运动的一段。
思考 2 它没有提到东部地区。所以我需要查一下东部地区。
操作 2 查找 [东部地区]
观察 2 (结果 1 / 1) 东部区域延伸至高平原，称为中原造山带。
思考 3 科罗拉多造山运动的东部区域延伸到高平原。所以我需要搜索高平原，找到它的海拔范围。
操作 3 搜索 [高平原]
观察 3 高平原指的是两个截然不同的陆地区域之一。
思考 4 我需要搜索高平原（美国）
操作 4 搜索 [高平原（美国）]
观察 4 高平原是大平原的一个分区。从东到西，高平原的海拔从 1800 到 7000 英尺 (550 到 2130 米) 不等。[3]
思考 5 高平原海拔高度从 1800 到 7000 英尺，所以答案是 1800 到 7000 英尺。
操作 5 结束 [1800 到 7000 英尺]
...

（这里面搜索是使用利用GPT插件在搜索引擎查找内容）

Shunyu Yao, Jeffrey Zhao等在 https://arxiv.org/abs/2210.03629 提出通过LLMs以交错的方式生成推理轨迹和任务特定操作(ReAct)，可以在语言和决策任务上的表现要高于几个最先进水准要求的的基线，同时还提高了LLMs的人类可解释性和可信度。
ReAct 是将上面提到 CoT 和工具组合，通过内部知识库和外部信息，一起组合思考过程，其中论文的效果如下图，通过在HotPotQA(知识密集型推理任务如问答)和Fever(事实验证)上使用不同提示方法得到的提示的表现结果说明了 ReAct 表现结果通常优于Act(只涉及操作)，同时可以观察到 ReAct 在Fever(事实验证)上的表现优于 CoT，而在 HotpotQA(知识密集型推理任务如问答) 上落后于 CoT`，论文对该方法进行了详细的误差分析，总而言之：

CoT 存在事实幻觉的问题
ReAct 的结构性约束降低了它在制定推理步骤方面的灵活性
ReAct 在很大程度上依赖于它正在检索的信息

来源https://arxiv.org/abs/2210.03629

2、验证ReAct功能

通过LangChain现有的工具链，我们来验证一个问题 珠穆拉玛峰旁边的西北方向的山峰叫什么名字？有谁成功登顶这座山峰？请用中文回答，先用GPT3.5来问答一下：

ReAct验证问题

从上述的结果看来，显然没有答对点上，于是我们用ReAct尝试（使用text-davinci-003），其代码如下：

# 更新或安装必要的库
# pip install --upgrade openai
# pip install --upgrade langchain
# pip install google-search-results

import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent

os.environ["OPENAI_API_KEY"] = "{你的TOKEN}"
os.environ["OPENAI_API_BASE"] = "{你的代理地址}"
# https://serpapi.com/manage-api-key
os.environ["SERPAPI_API_KEY"] = "{serpapi.com注册可以获得key}"
llm = OpenAI(model_name="text-davinci-003", temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
    tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("珠穆拉玛峰旁边的西北方向的山峰叫什么名字？有谁成功登顶这座山峰？请用中文回答")

其输出的过程和结果：

> Entering new AgentExecutor chain...
 I need to find out who successfully climbed the mountain and then calculate their age cubed
Action: Search
Action Input: "who successfully climbed the mountain near Mount Everest northwest"
Observation: Edmund Hillary (left) and Sherpa Tenzing Norgay reached the 29,035-foot summit of Everest on May 29, 1953, becoming the first people to stand atop the world's highest mountain.
Thought: I need to find out Edmund Hillary's age
Action: Search
Action Input: "Edmund Hillary age"
Observation: 88 years
Thought: I need to calculate 88 cubed
Action: Calculator
Action Input: 88^3
Observation: Answer: 681472

Thought: I now know the final answer
Final Answer: 意蒙德·希拉里（Edmund Hillary）成功登顶过山峰珠穆拉玛峰旁边的西北方向的山峰，他的年龄的3次方是681472。

> Finished chain

从上述可以看出，利用ReAct和其他模型结合，可能产生更高的智能。

3、ReAct是如何实现的？

在LangChain的源码中，ReAct实现比较简单，仍然是基于Prompt模板实现，我们来看看 langchain/agents/react 下的python代码：

EXAMPLES = [
    """Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action: Search[Colorado orogeny]
Observation: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought: It does not mention the eastern sector. So I need to look up eastern sector.
Action: Lookup[eastern sector]
Observation: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action: Search[High Plains]
Observation: High Plains refers to one of two distinct land regions
Thought: I need to instead search High Plains (United States).
Action: Search[High Plains (United States)]
Observation: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action: Finish[1,800 to 7,000 ft]""",
    """Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?
Thought: The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.
Action: Search[Milhouse]
Observation: Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.
Thought: The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".
Action: Lookup[named after]
Observation: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.
Thought: Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.
Action: Finish[Richard Nixon]""",
    """Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
Thought: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which documentary is about Finnish rock groups.
Action: Search[Adam Clayton Powell]
Observation: Could not find [Adam Clayton Powell]. Similar: [’Adam Clayton Powell III’, ’Seventh Avenue (Manhattan)’, ’Adam Clayton Powell Jr. State Office Building’, ’Isabel Washington Powell’, ’Adam Powell’, ’Adam Clayton Powell (film)’, ’Giancarlo Esposito’].
Thought: To find the documentary, I can search Adam Clayton Powell (film).
Action: Search[Adam Clayton Powell (film)]
Observation: Adam Clayton Powell is a 1989 American documentary film directed by Richard Kilberg. The film is about the rise and fall of influential African-American politician Adam Clayton Powell Jr.[3][4] It was later aired as part of the PBS series The American Experience.
Thought: Adam Clayton Powell (film) is a documentary about an African-American politician, not Finnish rock groups. So the documentary about Finnish rock groups must instead be The Saimaa Gesture.
Action: Finish[The Saimaa Gesture]""",
    """Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought: I need to search Nicholas Ray and Elia Kazan, find their professions, then find the profession they have in common.
Action: Search[Nicholas Ray]
Observation: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16, 1979) was an American film director, screenwriter, and actor best known for the 1955 film Rebel Without a Cause.
Thought: Professions of Nicholas Ray are director, screenwriter, and actor. I need to search Elia Kazan next and find his professions.
Action: Search[Elia Kazan]
Observation: Elia Kazan was an American film and theatre director, producer, screenwriter and actor.
Thought: Professions of Elia Kazan are director, producer, screenwriter, and actor. So profession Nicholas Ray and Elia Kazan have in common is director, screenwriter, and actor.
Action: Finish[director, screenwriter, actor]""",
    """Question: Which magazine was started first Arthur’s Magazine or First for Women?
Thought: I need to search Arthur’s Magazine and First for Women, and find which was started first.
Action: Search[Arthur’s Magazine]
Observation: Arthur’s Magazine (1844-1846) was an American literary periodical published in Philadelphia in the 19th century.
Thought: Arthur’s Magazine was started in 1844. I need to search First for Women next.
Action: Search[First for Women]
Observation: First for Women is a woman’s magazine published by Bauer Media Group in the USA.[1] The magazine was started in 1989.
Thought: First for Women was started in 1989. 1844 (Arthur’s Magazine) < 1989 (First for Women), so Arthur’s Magazine was started first.
Action: Finish[Arthur’s Magazine]""",
    """Question: Were Pavel Urysohn and Leonid Levin known for the same type of work?
Thought: I need to search Pavel Urysohn and Leonid Levin, find their types of work, then find if they are the same.
Action: Search[Pavel Urysohn]
Observation: Pavel Samuilovich Urysohn (February 3, 1898 - August 17, 1924) was a Soviet mathematician who is best known for his contributions in dimension theory.
Thought: Pavel Urysohn is a mathematician. I need to search Leonid Levin next and find its type of work.
Action: Search[Leonid Levin]
Observation: Leonid Anatolievich Levin is a Soviet-American mathematician and computer scientist.
Thought: Leonid Levin is a mathematician and computer scientist. So Pavel Urysohn and Leonid Levin have the same type of work.
Action: Finish[yes]""",
]
SUFFIX = """nQuestion: {input}
{agent_scratchpad}"""

WIKI_PROMPT = PromptTemplate.from_examples(
    EXAMPLES, SUFFIX, ["input", "agent_scratchpad"]
)

实现流程如下：
（1）用Prompt喂一个样例，比如上面的模板：Thought: xxx Action: Search[xxx]；
（2）LLMs会根据模板相同的逻辑，结合 CoT 思维链方式一步一步思考，并获取外部知识；
（3）最后 Action: Finish 获取最终结果后结束；