Computer systems that perform tasks typically requiring human intelligence — language, vision, reasoning.
AI is an umbrella term for software that reproduces human cognitive abilities. In practice today, most AI work refers to LLM-based systems — ChatGPT, Claude, Gemini. Enterprise value typically comes from automation, customer support, and decision support.
Neural network trained on massive text corpora to generate natural-language responses.
An LLM is a multi-billion-parameter neural network trained on trillions of tokens. Examples: GPT-4, Claude, Llama. Not a knowledge database but a pattern generator — must be combined with RAG or fine-tuning for reliable enterprise use.
Architecture where relevant document chunks are retrieved via vector search and injected into the prompt.
RAG is the standard approach for grounding LLMs in your proprietary data. Steps: 1) embed documents, 2) store in vector DB, 3) retrieve top-k for each query, 4) send with the prompt. RAG produces more accurate, current, and citable responses than prompt engineering alone.
Autonomous LLM-driven system that calls tools, makes decisions, and completes tasks.
AI agents differ from chatbots by acting, not just talking: calling APIs, reading databases, sending emails. Orchestration typically uses LangGraph, CrewAI, or OpenAI Assistants. Production agents require tool-permission models, cost limits, and human-in-the-loop controls.
Multiple specialised AI agents collaborating on a shared task.
Multi-agent systems divide work across role-specialised agents — planner, executor, verifier. Supervisor and planner-executor are the most common patterns. They outperform single large agents on complex multi-step tasks but are harder to debug and control.
Deliberate design of the instruction given to an LLM to produce the desired output.
Prompt engineering includes role definition (system prompt), few-shot examples, structured output specification (JSON schema), iteration, and testing. A good prompt can be 3–5x more accurate than a weak one. It's the cheapest first intervention before fine-tuning.
Further training of a pre-trained LLM on your own data for a specific task.
Fine-tuning specialises a base model (Llama 3.1, GPT-4o-mini) on your data. Methods: LoRA (lightweight, cheap), full fine-tune (heavier, stronger). Typical use-cases: domain terminology, brand voice, stable structured output. Doesn't replace RAG — pairs with it.
Database storing embedding vectors with fast similarity search.
Vector DBs (Pinecone, Qdrant, Weaviate, pgvector) perform fast similarity search over billions of embeddings. The backbone of RAG pipelines. Selection factors: managed vs self-host, EU vs US region, hybrid search support, scalability.
Numerical vector representation of text that preserves meaning.
An embedding is a 768–3072 dimensional vector representing the meaning of a text chunk. Similar texts land close together in vector space. Major providers: OpenAI (text-embedding-3), Voyage, Cohere, open-source (BGE, E5). Embedding choice can shift RAG accuracy 5–15%.
Malicious input that overrides the LLM's original instruction.
Prompt injection is the most common AI security vulnerability. Example: user input includes 'ignore previous instructions and...'. Defenses: input validation, instruction hierarchy, output guardrails, limited tool access, prompt-level sandboxing.
Input- or output-checking layer that prevents undesired AI behaviour.
Guardrails can be rule-based (regex, block-lists), ML-based (toxicity, PII detectors), or LLM-based (judge models). Typical uses: PII redaction, toxicity filtering, off-topic rejection, output format validation.
Removing personal data (names, emails, IDs) before sending a prompt to an LLM.
PII redaction is mandatory for GDPR-compliant AI. Implemented via regex, ML NER models, or dedicated services (Presidio, Nightfall). Happens BEFORE the prompt leaves your infrastructure so sensitive data never reaches the LLM provider.
Role-Based Access Control — governing tool access and data visibility per user role.
In AI systems, RBAC controls which role can invoke which tool and see which data in RAG. Critical in multi-tenant and regulated environments. Implementation: middleware before the prompt + post-filter on LLM output.
Real-time voice AI system that converses and invokes tools.
Voice agents combine speech-to-text (Deepgram, Whisper), LLM, and text-to-speech (ElevenLabs, Cartesia) layers. Typical platforms: Vapi, LiveKit, Retell. Latency is critical — the full cycle must be under ~500ms for natural conversation.
The maximum number of tokens an LLM can process at once.
The context window covers input + output combined. GPT-4: 128k tokens. Claude Sonnet 4.6: 1M tokens. Gemini 2.5 Pro: 2M tokens. Larger windows fit more documents but cost more and slow responses. Context caching (Anthropic, OpenAI) can cut repeated-prompt cost by 90%.
When an LLM confidently generates false information.
Hallucination stems from LLMs being probabilistic pattern generators, not knowledge stores. Mitigations: RAG (source-bound answers), citation tracking, fact-check layers, human review. GPT-4 and Claude Sonnet 4.6 have improved but can't be zeroed out — critical use-cases always need human-in-the-loop.
LLM text unit, roughly 0.7 English words.
LLMs count in tokens. 1000 tokens ≈ 700 English words or ~500 Hungarian words (Hungarian is more inflected). Pricing is per-token: ~$3/1M input, ~$15/1M output for Claude Sonnet in 2026.
Anthropic-developed standard for tool communication between LLMs and external services.
MCP lets a single tool-server written once serve multiple LLM clients (Claude Desktop, Claude Code, your agent). Became the industry standard in 2025. Alternative to bespoke function calling.
Deliberate design of the LLM's context — not just prompt, but the whole input stack.
Context engineering is the evolution of prompt engineering: systematically assembling what goes into the LLM context (system prompt, few-shot, RAG chunks, tool defs, prior conversation). Especially important with long-context models.
Protecting AI systems from prompt injection, data leakage, and other attacks.
AI security has four layers: input validation (prompt injection), output guardrails (PII, toxicity), access control (RBAC, tool permissions), and audit (logging, monitoring). Regulated sectors require additional compliance (DORA, MDR, GDPR).
AI-driven automation of business processes — support, document processing, email.
AI automation goes beyond classic RPA: the LLM can make context-aware decisions, not just run scripts. Common use-cases: multilingual customer support, product description generation, email triage, financial reporting.
EU Digital Operational Resilience Act governing financial firms' IT and AI systems.
DORA is mandatory EU-wide from 2025: banking AI systems must have incident reporting, risk management, and vendor-management processes. Budapest AI firms can serve such clients given full documentation and audit trails.
EU General Data Protection Regulation governing personal data processing.
GDPR is the foundational EU privacy law. For AI: lawful basis for processing, data subject rights, DPIA for high-risk processing, and cross-border data transfer rules. Hungarian enforcement body: NAIH.
AI that creates new content — text, image, audio, code.
Generative AI generates new output, not just classification or prediction. Main families: LLMs (text), diffusion (image, video), TTS (audio), code models. Enterprise adoption has grown exponentially since 2023.
Transferring a large model's 'knowledge' to a smaller, faster model.
Distillation trains a smaller student model on the outputs of a larger teacher model. Result: 80–90% quality at 10% cost and 5x faster response. OpenAI, Anthropic, and Google all offer distillation workflows.
Measuring AI system performance — accuracy, speed, cost, toxicity.
AI eval requires a custom suite: not just loss or generic accuracy, but real business metrics. Tools: LangSmith, Langfuse, Promptfoo, Ragas. Always A/B test against the base model before production.
Including a few examples in the prompt to guide the pattern the LLM follows.
Few-shot prompting shows 1–5 input-output examples so the LLM copies the style. Often more effective than fine-tuning, especially for stable formats (JSON, XML) or specific tones (brand voice, legal style).
LLM-driven iterative coding where the developer describes intent and AI generates code.
Vibe coding refers to AI-assisted development with Cursor, Claude Code, or similar — often 30–70% of production developer time in 2026. The question isn't whether to adopt, but which workflow to use.
AI systems meeting legal, privacy, and ethical requirements.
EU has three main layers: GDPR (personal data), DORA (financial resilience), EU AI Act (fully enforceable in 2026 — high-risk AI requirements). Hungary adds NAIH and MNB vendor-management rules.
Splitting documents into smaller, searchable pieces for a RAG pipeline.
Chunking is the step where source documents are broken into manageable pieces before embedding. Size matters: too small and context is lost, too large and relevance gets diluted. Paragraph-based chunking usually beats a fixed token boundary because it respects the document's semantic structure. Typical size: 200–500 tokens with 10–20% overlap.
Re-ordering vector-search results with a more precise model.
Reranking is the second retrieval stage in RAG: the top-20 results from vector search are re-scored by a dedicated reranker model (Cohere Rerank, BGE-reranker), and only the best 3–5 reach the LLM. It typically improves relevance 3–4×, making it one of the cheapest quality wins in a RAG system.
Combining vector search with keyword search (BM25).
Hybrid search unites the strengths of semantic vector search (meaning-based) and classic BM25 full-text search (exact keyword match). Vector search handles paraphrases well; BM25 handles exact codes, names, and numbers. Production RAG almost always uses hybrid search because the two together cover each other's blind spots.
The base instruction that defines an LLM's role and behaviour.
The system prompt is the first, constant layer of an LLM call: it sets the model's role, tone, constraints, and output format. Written well, it reduces hallucination and stabilises output. Never put a secret or API key in a system prompt — assume it can be extracted.
An LLM's ability to invoke external functions and APIs in a structured way.
Function calling is what turns a chatbot into an agent: instead of generating text, the model decides which tool to call with which parameters. Your code runs the function and the result is returned to the LLM. It is the foundation of every AI agent. MCP is the standardised form of function calling.
A human approval point at the critical steps of an AI workflow.
Human-in-the-loop (HITL) means an AI system pauses and asks for human approval before an irreversible or high-stakes action — sending an email, making a payment, deleting data. It is mandatory for production agents on any destructive or customer-facing output. HITL is the key balance between autonomy and safety.
The parameter that controls randomness in an LLM's output.
Temperature ranges from 0 to 1 (or 2). Low (0–0.3): deterministic, focused output — ideal for extraction, classification, structured output. High (0.7–1): more creative and varied — for marketing copy or ideation. For regulated use-cases, always use a low temperature.
An efficient fine-tuning method that trains only a small fraction of weights.
LoRA (Low-Rank Adaptation) trains small adapter matrices instead of the full model — at a fraction of the memory and cost. QLoRA goes further: it runs on a quantised model, so even a 70B model can be fine-tuned on a single GPU. Most enterprise fine-tunes today are LoRA-based, because 500–2,000 examples are enough.
The process of a trained model generating an answer for an input.
Inference is 'running' the model — as opposed to training, which is teaching it. Inference cost is what you pay on every API call, and it compounds in production. Ways to cut inference cost: a smaller or fine-tuned model, prompt caching, batch processing, and matching the right model to the right task.
Prompting an LLM to reason step by step for a more accurate answer.
Chain-of-thought (CoT) prompting asks the model to show its reasoning before the final answer. On complex, multi-step tasks (maths, logic, planning) it meaningfully improves accuracy. Modern reasoning models (o3, Claude reasoning mode) do this built-in.
The EU's comprehensive AI regulation, structured by risk level.
The EU AI Act classifies AI systems by risk: prohibited, high-risk, limited, minimal. High-risk systems (e.g. hiring, credit scoring, healthcare) face strict documentation, transparency, and human-oversight requirements, fully enforceable in 2026. Alongside GDPR and DORA, it is the third major EU compliance layer.
Bypassing an LLM's safety constraints with a manipulative prompt.
A jailbreak is a prompt technique that gets the model to break its built-in constraints — often via role-play ('pretend you are…') or a hypothetical frame. It is a cousin of prompt injection. Defence: a guardrail layer that evaluates intent before the main model runs, plus output filtering.