paper lint : https://openreview.net/pdf?id=22pyNMuIoa

읽고 난 후

Prompt Engineering을 Strategic Planning Promblem으로 생각해서 MCTS를 적용한다는 아이디어 자체가 인상 깊었음

Appendix를 읽어야 할 것 같

배경지식

strategic planning : 조직이 전략이나 방향을 정의하고, 전략적 목표를 달성하기 위해 자원을 할당하는 결정을 내리는 프로세스

Monte Carlo Tree Search(MCTS): Tree Search 방법들 중 하나이고 MDP를 해결하는 방법의 한 종류이다. 틱택토 게임을 예로 들면, 현재 상황에서 가능한 모든 경우의 수들을 tree 형태로 뻗어나가며 좋은 수인지 판단한 후 가장 좋은 수를 선택

하지만 경우의 수가 매우 많은 체스, 바둑 등의 게임에는 모든 경우의 수를 탐색하는 것이 거의 불가능합니다. 이 한계를 극복하기 위한 tree search 알고리즘이 바로 MCTS라고 하겠다.

MCTS는 어떤 상태에서 게임이 종료될 때까지 모든 경우의 수를 탐색하지 않고, Monte Carlo 기반 시뮬레이션을 통해 랜덤한 수를 두어가면서 게임을 한번 끝까지 진행

MDP(Markov Decision Process) : 의사결정 과정을 확률 그래프를 이용하여 모델링한 것

Abstract & Introductio

고도로 효과적인 Task-specific한 prompts는 전문가에 의해서 생성되야함 근데 이 전문가는 LLM의 직관이랑 target task에 대한 복잡한 세부사항을 알고 있어야함, 그걸 기반으로 효과적인 prompts를 구성할 수 있음

이런 전문가-level의 prompt를 자동화하는 건 굉장히 어려운 부분임

문제점

기존 prompt optimization은 domain 지식의 깊이를 간과하는 경향이 있음
광범위한 expert-level prompt를 효과적으로 찾는 것에 대한 어려움이 있음
prompt를 잘 설계하기 위해 사람-모델 간의 상호작용을 통해 시행착오를 거치는 임시방편적인 과정에 의존적

그래서 PromptAgent라는 방법을 제안함 → prompt optimization을 strategic planning problem으로 보고 Monte Carlo Tree Search를 기반으로 목표에 도달하려고 시도함

이 방법은 model → error, error에 대한 건설적인 feedback 생성을 반영함으로써 expert-level insight와 in-depth instruction을 가져올 수 있게 함

이상적인 prompt from human expert (도메인에 대한 지식 + LLM에 대한 직관이 필수적임)

복잡한 특성으로 on API-based LLMs에서 expert-level prompt engineering은 challenges가 됨

기존 automatic prompt optimization은 prompt engineering이 human-in-the-loop application이라는 것을 간과했음(오류를 수정하고 필수적인 도메인 지식을 통합시킴)

Methodology

LLM B and a target task T

an optimized natural language prompt P^T

전문지식을 요하는 domain에서는 초보자와 전문가 prompt engineers의 차이가 꽤 중요하다. 그래서 목표는 P^T를 인간의 개입을 최소화하고 자동적으로 refine 하는 것이다

Promblem formulation

P_0 부터 시작, 작은 train set으로 학습

T as (Q, A) = {qi, ai}_i=1 ^N, where qi/ai are input/output pairs for each sample

s a measure function R (e.g., accuracy).

R을 최대로 하는 P*을 찾는 것이 prompt optimization의 목표

S denotes the sample space for a natural language prompt

PromptAgent FrameWork Design

state = prompt로 표현된다

action은 간단하게 표현해서 현재 prompt 가 다음 promt로 표현되는 것을 의미함, 그리고 이 action은 그림에서 표현된 것처럼 Error Feedback을 반영하여 다음 prompt가 수정되는 것을 의미함 + Error Feedback을 생성하는 것도 action에 해당됨

PromptAgent(FrameWork) : BaseModel → Answer 생성 → Error 모으고, ErrorFeedback 생성(LLM_2) → New Prompt 생성(LLM_2) → BaseModel로 다시 Anser 생성 : reward function으로 성능 평가

Strategic Planning for Prompt Optimization

가장 좋은 prompt를 선택하는 strategic planning으로 MCTS를 사용하며 왼쪽에 있는 그림과 같이 진행됨, 반복적으로 selection, expansion, simulation, and back-propagation를 진행시키며 미리 정의한 반복 횟수에 도달하면 멈춤 → 가장 높은 reward trace를 가진 prompt를 선택함

Selection

현재 prompt에서 가장 높은 reward를 가진 prompt를 선택 / 매 반복마다 S_0 root node에서 출발

Upper Confidence bounds applied to Trees UCT 알고리즘을 사용해서 다음 node를 선택함 수식은 아래와 같다

where A(st) is the action set for node st, N (st) is the number of visiting times for node st, ch(s, a) represents the child node for st after applying action a′t and c is a constant to adjust the exploration(탐험). *exploitation(이용)

Expansion

action generation으로 새로운 node(prompt)가 생성됨

생성되고 바로 Simulation이 적용된다

Simulation

확장된 Prompt로 playout policy를 통해 마지막 terminal state까지 계산하고 future reward를 계산한다.

Back-propagation

Maximum depth 를 만나면 해당 node로 back-propa를 진행, 여기서 말하는 back-propa는 모델의 파라미터를 업데이트하는 게 아니라 node를 선택하기 위하고 확장하기 위한 각 노드들의 reward값을 의미한다.

그래서 마지막 evaluation에 사용할 prompt는 reward가 가장 높은 node를 선택했다.

Experiments

Dataset

Geometry

Input : This SVG path element <path d="M 55.57,80.69 L 57.38,65.80 M 57.38,65.80 L 48.90,57.46 M 48.90,57.46 L 45.58,47.78 M 45.58,47.78 L 53.25,36.07 L 66.29,48.90 L 78.69,61.09 L 55.57,80.69"/> draws a Options: (A) circle (B) heptagon (C) hexagon (D) kite (E) line (F) octagon (G) pentagon (H) rectangle (I) sector (J) triangle

Target : B

Objective Counting

I have a flute, a piano, a trombone, four stoves, a violin, an accordion, a clarinet, a drum, two lamps, and a trumpet. How many musical instruments do I have? 8

Casual Judgement

How would a typical person answer each of the following questions about causation? A machine is set up in such a way that it will short circuit if both the black wire and the red wire touch the battery at the same time. The machine will not short circuit if just one of these wires touches the battery. The black wire is designated as the one that is supposed to touch the battery, while the red wire is supposed to remain in some other part of the machine. One day, the black wire and the red wire both end up touching the battery at the same time. There is a short circuit. Did the black wire cause the short circuit? Options: - Yes - No No

"BIG-Bench," is a benchmark designed for evaluating the capabilities of large language models (LLMs) across a wide variety of tasks. The primary goals of BIG-Bench

Baseline

PromptAgent가 생성한 prompt
human propt
CoT prompt
Prompt optimization(GPT Agent and Automatic Prompt Engineer 이하 APE)

Model

basemodel : GPT-3.5* GPT-4 PaLM 2
Optimization model : ChatGPT Plugins(GPT-4)

Result

GPT-Agent는 planning도 있고 self-reflect ability도 있지만 prompt를 단 한번만 rewriting해서 prompt space를 탐험하는데 제한이 있다고 함

APE는 Monte Carlo search를 사용해서 탐험을 하지만 planning과 error-based reflections이 없다고

(*planning : 목표를 향해 가는 계획, PromptAgent는 Strategic planning problem을 해결하는 게 목적이였음/ Planning is a methodical process used to navigate the complex space of potential prompts to identify those that yield the best performance for a given task. )

BBH tasks often require strictly formatted solutions that can be readily induced by the step-by-step CoT reasoning : CoT 같은 과정같이 형식화된 solution이 필요함

Domain-Specific, NLP Tasks : domain에 대한 지식과 LLM prompt engineering에 대한 직관이 필요

PromptAgent는 전문가 수준으로 prompt 를 끌어올렸고 초보자와 숙련자 prompt engineers에 대한 차이를 좁혔다. - prompt engineer 직업의 종말…?

Conclusion

Domain-Specific한 지식을 새롭게 생성되는 prompts속에 넣을 수 있음
prompt engineering으로 최신 대규모 언어모델의 높은 수준의 과제 이해능력을 활용할 수 있는 길을 열었음

Limitation

고도로 전문화된 영역에서는 PromptAgent의 도메인 지식이 제한적이기 때문에 새로운 어려움이 있을 수 있다는 점을 지적

극복 아이디어

첫째, 현재 논문의 범위를 벗어나지만, GPT-4를 최적화기로 사용하는 것이 이상적이지 않은 전문 영역에서 PromptAgent의 적응성을 높이기 위한 다양한 전략을 제안합니다.

전문 지식 프롬프트를 활용하여 GPT-4를 특정 도메인에 적응시키기
검색 기술을 활용하여 도메인 지식 보완하기
오류 피드백을 위한 품질 관리 메커니즘 구현하기
일반 LLM과 전문 도메인 LLM을 결합한 하이브리드 최적화기 통합하기
최적화 과정에서 전문가의 도메인 특화 가이던스 활용하기

둘째, 민감한 영역을 위해 합성 또는 익명화된 데이터셋을 사용한 데이터 증강 기술도 프라이버시 표준을 준수하면서 LLM의 도메인 전문성을 높일 수 있는 유망한 방법입니다.

셋째, 향후 연구에서는 이러한 전략을 정교화하여 PromptAgent의 현재 한계를 완화하고, 전문가 수준 프롬프트 응용 분야의 범위와 영향력을 넓혀야 합니다.

요약하면, 다양한 전략을 통해 PromptAgent의 전문 영역 적응성을 높이고, 프라이버시에 유의하면서 LLM의 도메인 지식을 강화하여 전문가 수준 프롬프트 활용 범위를 넓혀야 한다고 제안하고 있습니다.

저작자표시 (새창열림)

'AI 관련 > 논문 리뷰' 카테고리의 다른 글

[논문 리뷰] Let's Verify Step by Step (2)	2024.06.06
[논문 리뷰] Dense Passage Retrieval for Open-Domain Question Answering (0)	2024.03.04
[논문 리뷰] FINETUNED LANGUAGE MODELS ARE ZERO-SHOTLEARNERS (FLAN) (1)	2024.01.04
[논문 리뷰] MULTITASK PROMPTED TRAINING ENABLESZERO-SHOT TASK GENERALIZATION (T0) (2)	2024.01.03
[경량화 #1] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot 리뷰 (2)	2024.01.03

나는 좋은 일들만 끌어당겨, 그것도 아주 많이

[논문 리뷰] PROMPTAGENT: STRATEGIC PLANNING WITH LARGELANGUAGE MODELS ENABLES EXPERT-LEVELPROMPT OPTIMIZATION

읽고 난 후

배경지식

Abstract & Introductio

문제점

Methodology