๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿค–AI/๐Ÿค— HuggingFace

HuggingFace Pipeline ์‚ฌ์šฉํ•ด๋ณด๊ธฐ

by ์ฝ”ํ‘ธ๋ณด์ด 2024. 12. 12.
๋ฐ˜์‘ํ˜•

Hugging Face์˜ Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ตฌํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” Colab ํ™˜๊ฒฝ์—์„œ transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ pipeline ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ NLP ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœ ํ•ด๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

huggingface transformers pipelines



1. Pipeline ์†Œ๊ฐœ

pipeline์€ Hugging Face ํ•™์Šต ์‹œ ๊ฐ€์žฅ ์ฒ˜์Œ ๋ณด๊ฒŒ ๋˜๋Š” ์ธํ„ฐํŽ˜์ด์Šค๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ๋ถ„๋ฅ˜, ์งˆ๋ฌธ ๋‹ต๋ณ€, ํ…์ŠคํŠธ ์ƒ์„ฑ ๋“ฑ ์—ฌ๋Ÿฌ NLP ์ž‘์—…์„ ์†์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด์„œ, Hugging Face๋ฅผ ์ฒ˜์Œ ์‹œ์ž‘ํ•œ๋‹ค๋ฉด pipeline ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ฒˆ ์‹ค์Šต์—์„œ๋Š” pipeline์— ๋Œ€ํ•œ ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ณด๊ณ , ๊ฐ ์ฝ”๋“œ์˜ ๋‚ด์šฉ์— ๋Œ€ํ•ด์„œ ๋ง์”€๋“œ๋ฆฌ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธ€์ด ๋‹ค๋ฃจ๋Š” ๋‚ด์šฉ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

- ๊ฐ์ • ๋ถ„์„: ํ…์ŠคํŠธ๊ฐ€ ๊ธ์ •์ ์ธ์ง€ ๋ถ€์ •์ ์ธ์ง€ ํŒ๋ณ„
- ํ…์ŠคํŠธ ์ƒ์„ฑ: ์ฃผ์–ด์ง„ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ ์ƒ์„ฑ
- ์งˆ๋ฌธ ๋‹ต๋ณ€: ํ…์ŠคํŠธ์—์„œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ์ฐพ๊ธฐ
- ๋ฒˆ์—ญ: ํ…์ŠคํŠธ๋ฅผ ๋‹ค๋ฅธ ์–ธ์–ด๋กœ ๋ฒˆ์—ญ

- ์š”์•ฝ: ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ์š”์•ฝํ•˜๊ธฐ

 

๊ทธ ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ์ž‘์—…(task) ์œ ํ˜•์„ pipeline์ด ์ œ๊ณตํ•˜๊ณ  ์žˆ์ง€๋งŒ, ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์šฐ์„ ์ ์œผ๋กœ ์œ„ ๋‚ด์šฉ๋งŒ ๋‹ค๋ค„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


2. ์ค€๋น„ ๋‹จ๊ณ„: ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

Colab์—์„œ๋Š” ์…ธ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜์—ฌ `transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด ๊ธ€์„ ์ž‘์„ฑํ•˜๋Š” ์‹œ์ ์— Colab ์— ๊ธฐ๋ณธ์ ์œผ๋กœ transformers๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ, ์„ค์น˜๊ฐ€ ์ด๋ฏธ ๋˜์–ด ์žˆ๋Š” ์ƒํƒœ๋ผ๋ฉด ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ Colab ์—์„œ ์‹คํ–‰ํ•˜์‹œ๋ฉด ์ด๋ฏธ ์„ค์น˜๊ฐ€ ๋˜์–ด ์žˆ๋‹ค๋Š” ๋กœ๊ทธ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋  ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

!pip install transformers



3. Pipeline์„ ์‚ฌ์šฉํ•œ NLP ์ž‘์—…

3.1 ๊ฐ์ • ๋ถ„์„
๋จผ์ €, ๊ฐ์ • ๋ถ„์„ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. "pipeline" ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  "sentiment-analysis" ์ž‘์—… ์œ ํ˜•์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")

# ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„์„
result = classifier("I've been waiting for a HuggingFace course my whole life.")
print(result)

# ์—ฌ๋Ÿฌ ๋ฌธ์žฅ ๋ถ„์„
results = classifier([
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!"
])
print(results)


์ถœ๋ ฅ์€ ์˜ˆ์ธก๋œ ๊ฐ์ •(๊ธ์ •/๋ถ€์ •)๊ณผ ์‹ ๋ขฐ๋„ ์ ์ˆ˜๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.



3.2 ํ…์ŠคํŠธ ์ƒ์„ฑ
ํ•œ๊ตญ์–ด๋ฅผ ์ง€์›ํ•˜๋Š” ๋ชจ๋ธ์„ pipeline์— ์ง€์ •ํ•˜๋ฉด ํ•œ๊ตญ์–ด๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์ƒ์„ฑ ์˜ˆ์ œ๋Š” ํ•œ๊ตญ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์„œ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ‹ฐ๋กœ "skt/kogpt2-base-v2" ๋ชจ๋ธ์„ ์ ์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. pipeline์˜ task๋ฅผ "text-generation"์œผ๋กœ ์ง€์ •ํ•œ ํ›„์— pipeline ๊ฐ์ฒด์— ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด, ์ž…๋ ฅ๋œ ํ”„๋กฌํ”„ํŠธ์— ์ด์–ด์งˆ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

from transformers import pipeline

generator = pipeline("text-generation", model='skt/kogpt2-base-v2')

# ๊ฐ„๋‹จํ•œ ํ…์ŠคํŠธ ์ƒ์„ฑ
result = generator("์—ฌ๋Ÿฌ๋ถ„ ๋งŒ๋‚˜์„œ ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค. ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ฐ•์˜๋ฅผ ์‹œ์ž‘", max_length=30, num_return_sequences=2)
print(result)

 

  • max_length=30์€ ์ƒ์„ฑ๋œ ๋ฌธ์žฅ์˜ ์ตœ๋Œ€ ๊ธธ์ด๋ฅผ 30 ํ† ํฐ์œผ๋กœ ์ œํ•œ ํ•˜๋Š” ์—ญํ• ์„,
  • num_return_sequences=2๋Š” ๊ฐ ์ž…๋ ฅ์— ๋Œ€ํ•ด 2๊ฐœ์˜ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ์ง€์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

์ถœ๋ ฅ์€ ํ™•๋ฅ ์— ๊ธฐ๋ฐ˜ํ•˜๊ธฐ์— ๋งค๋ฒˆ ์กฐ๊ธˆ์‹ ๋‹ฌ๋ผ์งˆ ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์•„๋ž˜์™€ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

[{'generated_text': '์—ฌ๋Ÿฌ๋ถ„ ๋งŒ๋‚˜์„œ ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค. ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ฐ•์˜๋ฅผ ์‹œ์ž‘ํ•˜๋ฉฐ...'},
 {'generated_text': '์—ฌ๋Ÿฌ๋ถ„ ๋งŒ๋‚˜์„œ ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค. ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ฐ•์˜๋ฅผ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค.'}]



3.3 ์งˆ๋ฌธ ๋‹ต๋ณ€
pipeline์ด ์ง€์›ํ•˜๋Š” ๊ธฐ๋Šฅ ์ค‘์—๋Š” ์งˆ๋ฌธ-๋‹ต๋ณ€ ๊ธฐ๋Šฅ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด ์งˆ๋ฌธ-๋‹ต๋ณ€์„ ์œ„ํ•ด "klue/roberta-base" ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์–ด์ง„ ์ปจํ…์ŠคํŠธ์—์„œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ์ถ”์ถœํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="klue/roberta-base")

context = """
๋ฃจํŠธ๋น„ํžˆ ํŒ ๋ฒ ํ† ๋ฒค์€ ๋…์ผ์˜ ์„œ์–‘ ๊ณ ์ „ ์Œ์•… ์ž‘๊ณก๊ฐ€์ด์ž ํ”ผ์•„๋‹ˆ์ŠคํŠธ์ด๋‹ค. ๋…์ผ์˜ ๋ณธ์—์„œ ํƒœ์–ด๋‚ฌ์œผ๋ฉฐ, ์„ฑ์ธ์ด ๋œ ์ดํ›„ ๊ฑฐ์˜ ์˜ค์ŠคํŠธ๋ฆฌ์•„ ๋นˆ์—์„œ ์‚ด์•˜๋‹ค.
"""

result = qa_pipeline(
    question="๋ฒ ํ† ๋ฒค์ด ํƒœ์–ด๋‚œ ๊ณณ์€ ์–ด๋””์ธ๊ฐ€์š”?",
    context=context
)
print(result)

 

pipeline ํ•จ์ˆ˜์˜ ์ž‘์—… ์œ ํ˜•์œผ๋กœ "question-answering"์„ ์ง€์ •ํ•˜๊ณ , klue/roberta-base๋ผ๋Š” ํ•œ๊ตญ์–ด ์งˆ๋ฌธ-๋‹ต๋ณ€์— ํŠนํ™”๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. context ๋ณ€์ˆ˜์—๋Š” ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ ์œ„ํ•œ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ์ž‘์„ฑํ•˜๊ณ , question์— ํ•ด๋‹นํ•˜๋Š” ์งˆ๋ฌธ์˜ ๋‹ต๋ณ€์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๋‹ต๋ณ€๊ณผ ๊ทธ ์‹ ๋ขฐ๋„ ์ ์ˆ˜๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

{
  'score': 0.98,  # ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์‹ ๋ขฐ๋„
  'start': 47,    # ๋‹ต๋ณ€์˜ ์‹œ์ž‘ ์œ„์น˜ (context์—์„œ์˜ ์ธ๋ฑ์Šค)
  'end': 49,      # ๋‹ต๋ณ€์˜ ๋ ์œ„์น˜ (context์—์„œ์˜ ์ธ๋ฑ์Šค)
  'answer': '๋ณธ'  # ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๋‹ต๋ณ€
}

๊ฒฐ๊ณผ ๊ฐ’์ด ์œ„์™€ ๋™์ผํ•˜๊ฒŒ ๋‚˜์˜ค์ง€๋Š” ์•Š์„ ํ™•๋ฅ ์ด ๋†’์ง€๋งŒ ๋งŒ์•ฝ ์ž˜ ๋‚˜์˜จ๋‹ค๋ฉด ์œ„์™€ ๊ฐ™์€ ํ˜•ํƒœ์˜ ๋‹ต๋ณ€์ด ๋‚˜์˜ฌ ๊ฒƒ ์ž…๋‹ˆ๋‹ค. 

 


3.4 ๋ฒˆ์—ญ
ํ•œ๊ตญ์–ด-์˜์–ด ๋ฒˆ์—ญ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ๋ฒˆ์—ญํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 

์ด ์ž‘์—…์€ ๋ณด์‹œ๋‹ค์‹ถ์ด ๊ต‰์žฅํžˆ ์ง๊ด€์  ์ž…๋‹ˆ๋‹ค. 'translation'์œผ๋กœ ์ž‘์—…์„ ์ง€์ •ํ•˜๊ณ , ํ•œ๊ตญ์–ด-์˜์–ด ๋ฒˆ์—ญ์ด ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ์ง€์ •ํ•˜๋ฉด ๋ฒˆ์—ญ ์ž‘์—…์ด ๊ฐ€๋Šฅ ํ•ฉ๋‹ˆ๋‹ค.

from transformers import pipeline

translator = pipeline("translation", model="circulus/kobart-trans-ko-en-v2")
result = translator("์˜ค๋Š˜ ์ ์‹ฌ์œผ๋กœ ์Šคํ…Œ์ดํฌ๋ฅผ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค.")
print(result)



3.5 ํ…์ŠคํŠธ ์š”์•ฝ
๊ธด ํ…์ŠคํŠธ๋ฅผ ์š”์•ฝํ•˜๋Š” ์ž‘์—…์€ "summarization"์„ pipeline์— ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋„˜๊ฒจ์ฃผ๋ฉด ๊ฐ€๋Šฅ ํ•ฉ๋‹ˆ๋‹ค. ๋ง ๊ทธ๋Œ€๋กœ ์š”์•ฝ ๊ธฐ๋Šฅ์œผ๋กœ ์ฃผ์–ด์ง„ text๋ฅผ ์š”์•ฝํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด ์ค๋‹ˆ๋‹ค.

from transformers import pipeline

summarizer = pipeline("summarization")

text = """
America has changed dramatically during recent years. Not only has the number of
graduates in traditional engineering disciplines such as mechanical, civil, electrical,
chemical, and aeronautical engineering declined, but in most of the premier American
universities engineering curricula now concentrate on and encourage largely the study
of engineering science.
"""

result = summarizer(text)
print(result)


4. ์ง€์›๋˜๋Š” ์ž‘์—… ์œ ํ˜•

`pipeline`์€ ๊ทธ ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. HuggingFace์˜ pipeline์ด ๊ธฐ๋ณธ์ ์œผ๋กœ ์ง€์›ํ•˜๋Š” ์ž‘์—… ์œ ํ˜•์€ ์•„๋ž˜์˜ ์ฝ”๋“œ๋กœ ํ™•์ธ ๊ฐ€๋Šฅ ํ•ฉ๋‹ˆ๋‹ค.

from transformers.pipelines import SUPPORTED_TASKS

print("Available tasks in Hugging Face pipeline:")
for task in SUPPORTED_TASKS:
    print(f"- {task}")



์ด ๊ธ€์—์„œ๋Š” Hugging Face์˜ `pipeline` ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ Colab์—์„œ ๋‹ค์–‘ํ•œ NLP ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. `transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” NLP ์ž‘์—…์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. ๊ณ„์†ํ•ด์„œ ์‹ค์Šต ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด HuggingFace ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋‚˜๊ฐ€๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.  ๐Ÿ˜€

 

๋ฐ˜์‘ํ˜•