Question 1

AI (Artificial Intelligence)

Accepted Answer

AI is an umbrella term for software that reproduces human cognitive abilities. In practice today, most AI work refers to LLM-based systems — ChatGPT, Claude, Gemini. Enterprise value typically comes from automation, customer support, and decision support.

Question 2

LLM (Large Language Model)

Accepted Answer

An LLM is a multi-billion-parameter neural network trained on trillions of tokens. Examples: GPT-4, Claude, Llama. Not a knowledge database but a pattern generator — must be combined with RAG or fine-tuning for reliable enterprise use.

Question 3

RAG (Retrieval-Augmented Generation)

Accepted Answer

RAG is the standard approach for grounding LLMs in your proprietary data. Steps: 1) embed documents, 2) store in vector DB, 3) retrieve top-k for each query, 4) send with the prompt. RAG produces more accurate, current, and citable responses than prompt engineering alone.

Question 4

AI agent

Accepted Answer

AI agents differ from chatbots by acting, not just talking: calling APIs, reading databases, sending emails. Orchestration typically uses LangGraph, CrewAI, or OpenAI Assistants. Production agents require tool-permission models, cost limits, and human-in-the-loop controls.

Question 5

Multi-agent system

Accepted Answer

Multi-agent systems divide work across role-specialised agents — planner, executor, verifier. Supervisor and planner-executor are the most common patterns. They outperform single large agents on complex multi-step tasks but are harder to debug and control.

Question 6

Prompt engineering

Accepted Answer

Prompt engineering includes role definition (system prompt), few-shot examples, structured output specification (JSON schema), iteration, and testing. A good prompt can be 3–5x more accurate than a weak one. It's the cheapest first intervention before fine-tuning.

Question 7

Fine-tuning

Accepted Answer

Fine-tuning specialises a base model (Llama 3.1, GPT-4o-mini) on your data. Methods: LoRA (lightweight, cheap), full fine-tune (heavier, stronger). Typical use-cases: domain terminology, brand voice, stable structured output. Doesn't replace RAG — pairs with it.

Question 8

Vector database

Accepted Answer

Vector DBs (Pinecone, Qdrant, Weaviate, pgvector) perform fast similarity search over billions of embeddings. The backbone of RAG pipelines. Selection factors: managed vs self-host, EU vs US region, hybrid search support, scalability.

Question 9

Embedding

Accepted Answer

An embedding is a 768–3072 dimensional vector representing the meaning of a text chunk. Similar texts land close together in vector space. Major providers: OpenAI (text-embedding-3), Voyage, Cohere, open-source (BGE, E5). Embedding choice can shift RAG accuracy 5–15%.

Question 10

Prompt injection

Accepted Answer

Prompt injection is the most common AI security vulnerability. Example: user input includes 'ignore previous instructions and...'. Defenses: input validation, instruction hierarchy, output guardrails, limited tool access, prompt-level sandboxing.

Question 11

Guardrail

Accepted Answer

Guardrails can be rule-based (regex, block-lists), ML-based (toxicity, PII detectors), or LLM-based (judge models). Typical uses: PII redaction, toxicity filtering, off-topic rejection, output format validation.

Question 12

PII redaction

Accepted Answer

PII redaction is mandatory for GDPR-compliant AI. Implemented via regex, ML NER models, or dedicated services (Presidio, Nightfall). Happens BEFORE the prompt leaves your infrastructure so sensitive data never reaches the LLM provider.

Question 13

RBAC

Accepted Answer

In AI systems, RBAC controls which role can invoke which tool and see which data in RAG. Critical in multi-tenant and regulated environments. Implementation: middleware before the prompt + post-filter on LLM output.

Question 14

Voice agent

Accepted Answer

Voice agents combine speech-to-text (Deepgram, Whisper), LLM, and text-to-speech (ElevenLabs, Cartesia) layers. Typical platforms: Vapi, LiveKit, Retell. Latency is critical — the full cycle must be under ~500ms for natural conversation.

Question 15

Context window

Accepted Answer

The context window covers input + output combined. GPT-4: 128k tokens. Claude Sonnet 4.6: 1M tokens. Gemini 2.5 Pro: 2M tokens. Larger windows fit more documents but cost more and slow responses. Context caching (Anthropic, OpenAI) can cut repeated-prompt cost by 90%.

Question 16

Hallucination

Accepted Answer

Hallucination stems from LLMs being probabilistic pattern generators, not knowledge stores. Mitigations: RAG (source-bound answers), citation tracking, fact-check layers, human review. GPT-4 and Claude Sonnet 4.6 have improved but can't be zeroed out — critical use-cases always need human-in-the-loop.

Question 17

Token

Accepted Answer

LLMs count in tokens. 1000 tokens ≈ 700 English words or ~500 Hungarian words (Hungarian is more inflected). Pricing is per-token: ~$3/1M input, ~$15/1M output for Claude Sonnet in 2026.

Question 18

MCP (Model Context Protocol)

Accepted Answer

MCP lets a single tool-server written once serve multiple LLM clients (Claude Desktop, Claude Code, your agent). Became the industry standard in 2025. Alternative to bespoke function calling.

Question 19

Context engineering

Accepted Answer

Context engineering is the evolution of prompt engineering: systematically assembling what goes into the LLM context (system prompt, few-shot, RAG chunks, tool defs, prior conversation). Especially important with long-context models.

Question 20

AI security

Accepted Answer

AI security has four layers: input validation (prompt injection), output guardrails (PII, toxicity), access control (RBAC, tool permissions), and audit (logging, monitoring). Regulated sectors require additional compliance (DORA, MDR, GDPR).

Question 21

AI automation

Accepted Answer

AI automation goes beyond classic RPA: the LLM can make context-aware decisions, not just run scripts. Common use-cases: multilingual customer support, product description generation, email triage, financial reporting.

Question 22

DORA

Accepted Answer

DORA is mandatory EU-wide from 2025: banking AI systems must have incident reporting, risk management, and vendor-management processes. Budapest AI firms can serve such clients given full documentation and audit trails.

Question 23

GDPR

Accepted Answer

GDPR is the foundational EU privacy law. For AI: lawful basis for processing, data subject rights, DPIA for high-risk processing, and cross-border data transfer rules. Hungarian enforcement body: NAIH.

Question 24

Generative AI

Accepted Answer

Generative AI generates new output, not just classification or prediction. Main families: LLMs (text), diffusion (image, video), TTS (audio), code models. Enterprise adoption has grown exponentially since 2023.

Question 25

Model distillation

Accepted Answer

Distillation trains a smaller student model on the outputs of a larger teacher model. Result: 80–90% quality at 10% cost and 5x faster response. OpenAI, Anthropic, and Google all offer distillation workflows.

Question 26

AI evaluation

Accepted Answer

AI eval requires a custom suite: not just loss or generic accuracy, but real business metrics. Tools: LangSmith, Langfuse, Promptfoo, Ragas. Always A/B test against the base model before production.

Question 27

Few-shot prompting

Accepted Answer

Few-shot prompting shows 1–5 input-output examples so the LLM copies the style. Often more effective than fine-tuning, especially for stable formats (JSON, XML) or specific tones (brand voice, legal style).

Question 28

Vibe coding

Accepted Answer

Vibe coding refers to AI-assisted development with Cursor, Claude Code, or similar — often 30–70% of production developer time in 2026. The question isn't whether to adopt, but which workflow to use.

Question 29

AI compliance

Accepted Answer

EU has three main layers: GDPR (personal data), DORA (financial resilience), EU AI Act (fully enforceable in 2026 — high-risk AI requirements). Hungary adds NAIH and MNB vendor-management rules.

Question 30

Chunking

Accepted Answer

Chunking is the step where source documents are broken into manageable pieces before embedding. Size matters: too small and context is lost, too large and relevance gets diluted. Paragraph-based chunking usually beats a fixed token boundary because it respects the document's semantic structure. Typical size: 200–500 tokens with 10–20% overlap.

Question 31

Reranking

Accepted Answer

Reranking is the second retrieval stage in RAG: the top-20 results from vector search are re-scored by a dedicated reranker model (Cohere Rerank, BGE-reranker), and only the best 3–5 reach the LLM. It typically improves relevance 3–4×, making it one of the cheapest quality wins in a RAG system.

Question 32

Hybrid search

Accepted Answer

Hybrid search unites the strengths of semantic vector search (meaning-based) and classic BM25 full-text search (exact keyword match). Vector search handles paraphrases well; BM25 handles exact codes, names, and numbers. Production RAG almost always uses hybrid search because the two together cover each other's blind spots.

Question 33

System prompt

Accepted Answer

The system prompt is the first, constant layer of an LLM call: it sets the model's role, tone, constraints, and output format. Written well, it reduces hallucination and stabilises output. Never put a secret or API key in a system prompt — assume it can be extracted.

Question 34

Function calling (tool use)

Accepted Answer

Function calling is what turns a chatbot into an agent: instead of generating text, the model decides which tool to call with which parameters. Your code runs the function and the result is returned to the LLM. It is the foundation of every AI agent. MCP is the standardised form of function calling.

Question 35

Human-in-the-loop

Accepted Answer

Human-in-the-loop (HITL) means an AI system pauses and asks for human approval before an irreversible or high-stakes action — sending an email, making a payment, deleting data. It is mandatory for production agents on any destructive or customer-facing output. HITL is the key balance between autonomy and safety.

Question 36

Temperature

Accepted Answer

Temperature ranges from 0 to 1 (or 2). Low (0–0.3): deterministic, focused output — ideal for extraction, classification, structured output. High (0.7–1): more creative and varied — for marketing copy or ideation. For regulated use-cases, always use a low temperature.

Question 37

LoRA / QLoRA

Accepted Answer

LoRA (Low-Rank Adaptation) trains small adapter matrices instead of the full model — at a fraction of the memory and cost. QLoRA goes further: it runs on a quantised model, so even a 70B model can be fine-tuned on a single GPU. Most enterprise fine-tunes today are LoRA-based, because 500–2,000 examples are enough.

Question 38

Inference

Accepted Answer

Inference is 'running' the model — as opposed to training, which is teaching it. Inference cost is what you pay on every API call, and it compounds in production. Ways to cut inference cost: a smaller or fine-tuned model, prompt caching, batch processing, and matching the right model to the right task.

Question 39

Chain-of-thought

Accepted Answer

Chain-of-thought (CoT) prompting asks the model to show its reasoning before the final answer. On complex, multi-step tasks (maths, logic, planning) it meaningfully improves accuracy. Modern reasoning models (o3, Claude reasoning mode) do this built-in.

Question 40

EU AI Act

Accepted Answer

The EU AI Act classifies AI systems by risk: prohibited, high-risk, limited, minimal. High-risk systems (e.g. hiring, credit scoring, healthcare) face strict documentation, transparency, and human-oversight requirements, fully enforceable in 2026. Alongside GDPR and DORA, it is the third major EU compliance layer.

Question 41

Jailbreak

Accepted Answer

A jailbreak is a prompt technique that gets the model to break its built-in constraints — often via role-play ('pretend you are…') or a hypothetical frame. It is a cousin of prompt injection. Defence: a guardrail layer that evaluates intent before the main model runs, plus output filtering.

AI glossary

AI development Budapest

RAG development

AI blog