Büyük dil modelleri giriş

Resume Atilla Özgür

polyglot programmer
database developer
build engineer
researcher

Resume Atilla Özgür Professional

Started programming in 1991, high school
Graduated in Electrical Engineering in 2003 from best Technical University in Turkey METU
22 years of Professional Software Development experience
6 years is Project Management and Team Leading experience
7 years of Database Administration experience
6 years of AI and optimization Algorithm development for Steel Factories
1.5 years of GenAI-LLM experience

Resume Atilla Özgür Professional

Worked with different web application development platforms and Database Systems
I have numerous Microsoft certifications (MCPD,MCSD,MCT)
I am certified in Oracle (OCA 11g) and SQL Server (2000-2008) Databases.
Worked with a lot of different programming languages professionally and academically
- C#
- Java
- Python
- Visual Basic
- javascript
- SQL
- R

Resume Atilla Özgür Academic

Bachelor Degree in Electrical Engineering in 2003 from best Technical University in Turkey, Middle of Technical University
Master of Science in Computer Engineering in 2008 from Atilim University, Turkey
PhD in Electrical Engineering in 2017 from Baskent University, Turkey
My thesis was was about machine learning, optimization and intrusion detection systems
I used python, matlab, groovy, weka in my thesis
My first post-doc work was about machine learning and optimization for steel production systems.
My second,current, post-doc work in Constructor University is about LLMs for requirement management/contract handling

Education

University	Department	Degree	Start	End
Middle East Technical University	Electrical Engineering	Bachelor	1995	2003
Atılım University	Computer Engineering	Master	2004	2007
Middle East Technical University	Medical Informatics (Incomplete)	Master	2005	2008
Başkent University	Electrical Engineering	PHD	2007	2017

Employment History

Title	Company	Start	End
Postdoctoral Researcher	Constructor University Bremen - Mathematics and Logistics	04/2024	-
Adjunct Professor (Part time)	Ankara Science University	09-2024	02-2025
Senior Developer	SMS Digital	02-2022	03-2024
Adjunct Professor (Part time)	Constructor University	09-2022	02-2024
Assistant Professor	Ankara Yıldırım Beyazıd University - Computer Engineering	06/2021	02/2022
Postdoctoral Researcher	Jacobs University Bremen - Mathematics and Logistics	01/2018	01/2022
Database Administrator	Turkish Labor Agency	07/2011	12/2017
Software Developer and Build/DevOps Engineer	Milsoft	05/2010	07/2011
Senior Software Developer	Tilda Telekom	10/2009	05/2010
Software Trainer	Freelance	06/2009	10/2009
Software Project Manager	Turksat	07/2008	06/2009
Software Project Manager	Simetri	09/2006	04/2008
Software Trainer	Netsoft	10/2005	12/2006
Software Developer	Kale Yazılım	10/2004	10/2005
Software Developer	Veripark	06/2003	10/2004

Broader AI

Course contents for LLM

Course book

How Large Language Models (LLMs) work

LLMs are built using supervised learning. They repeatedly predict next words using previous words.

My favorite food is a Döner with spicy souse

Input A	Output
My	favorite
My favorite	food
My favorite food	is
My favorite food is	a
My favorite food is a	Döner
My favorite food is a Döner	with
My favorite food is a Döner with	spicy
My favorite food is a Döner with spicy	souse

How Large Language Models (LLMs) work 2

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Stochastic Parrots

the term stochastic parrot is a disparaging metaphor, introduced by Emily M. Bender and colleagues in a 2021 paper, that frames large language models as systems that statistically mimic text without real understanding.

Neural networks

Source Andrej Karpathy: 1hr Talk Intro to Large Language Models

Single Neuron

source

Different activation functions

Sigmoid, widely used in old NN papers, is not here
GELU, GPT activation function, is not here also
More than 20 activation functions exist
More activation functions are proposed continuously

source: A scalable species-based genetic algorithm for reinforcement learning problems

Neural networks internals

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Modern Architectures Derived From GPT

LLM-visualization nano-gpt

source: bbycroft.net/llm

LLM-visualization gpt2

source: bbycroft.net/llm

LLM-visualization gpt3

source: bbycroft.net/llm

GPT2 sizes

dModel Context Size
Source GPT2 Article: Language Models are Unsupervised Multitask Learners

GPT3 sizes

Source GPT3 Article: Language Models are Few-Shot Learners

LLM-EvolutionTree

source

GPT Original architecture

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

GPT2-architecture

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Modern Architectures Derived From GPT

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Modern Architectures Derived From GPT: Gemma2

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Modern Architectures Derived From GPT: Llama3

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Modern Architectures Derived From GPT: Mixtral

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Modern Architectures Derived From GPT: Phi3

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

GPT2 vs Llama 1

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Llama 1 vs Llama 2

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Gemma 3 vs Qwen 3

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

The Big LLM Architecture Comparison

Read more in book author’s blog post 2025-07-19

LLM model size vs capabilities

Model Size (parameters)	Capabilities	Application
1B	pattern matching. Basic world knowledge	hotel sentiment analysis
10B	Greater world knowledge. Can follow basic instruction	Simple order chatbot
100B+	Rich world knowledge. Complex reasoning	Brainstorming partner

pre-train LLM

Dump language model 1 (Before pre-train)

flowchart TD
    LM["Dump Language Model GPT2"] -->|"Randomly generates output "| P
    P("$$p=\frac{1}{50257}$$")
    P --> L1("$$ L = -ln(p) $$")
    L1 --> L2("$$ 10.82  $$")

Figure 1

When LLM first start training, it randomly generates output.
Therefore, probability of any word coming is $\frac{1}{vocabulary size}$
GPT2 vocabulary size is 50257
at the start of the pre-training, we expect to see similar number, 10.82, for loss value

Dump language model 2 (Before pre-train)

flowchart TD
    LM["Dump Language Model Vocab Size GPT2"] -->|"Randomly generates output "| P
    P["$$p=\frac{1}{50257}$$"]
    P --> L1["$$ L = -ln(p) $$"]
    L1 --> L2["10.82  "]
    
    LM2["Dump Language Model Vocab Size 100"] -->|"Randomly generates output "| P2
    P2("$$p=\frac{1}{100}$$")
    P2 --> L21("$$ L = -ln(p) $$")
    L21 --> L22("$$ 4.60  $$")

Figure 2

here, we have to different vocab sizes, 50257 and 100
correspondingly, we will start with two different losses

Training LLMs Example llama2

Training is like compression of the terabytes of text

Video: 1hr Talk Intro to Large Language Models

Pre-Train step 1 dataset

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 2 tokenization

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 3 neural network training

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 4 inference

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pretrain Datasets

When training LLMs, Data quality matters

diverse data
- harmful speech
- biases
cleaning data
deduplication (remove duplicates)

Datasets used to train GPT-3

Datasets LLama 2

Our training corpus includes a new mix of data from publicly available sources, which does not include data from Meta’s products or services.
We made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals.
We trained on 2 trillion tokens of data as this provides a good performance–cost trade-off,
up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations.

Copyrighted Works Problem

Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations
OpenAI has been sued by novelists as far back as June 2023 for using their books to train its large language models,
with The New York Times following suit in December. - Nvidia has also been on the receiving end of a lawsuit filed by writers for using 196,640 books to train its NeMo model, which has since been taken down.
A former Nvidia employee blew the whistle on the company in August of last year, saying that it scraped more than 426 thousand hours of videos daily for use in AI training.
More recently, OpenAI is investigating if DeepSeek illegally obtained data from ChatGPT, which just shows how ironic things can get.
source

Using Copyrighted Works in LLMs according to www.copyright.com

LLMs use massive amounts of textual works—many of which are protected by copyright.
To do this, LLMs make copies of the works they rely on, which involves copyright in several ways, such as:
Using copyright-protected material in the training datasets of LLMs without permission can result in the creation of unauthorized copies: copies generated during the training process and copies in the form of representations of the training data embedded within the LLM after training. This creates potential copyright liability.
Outputs—the material generated by AI systems like LLMs—may create copyright liability if they are the same or too similar to one of the copyrighted works used as an input unless there is an appropriate copyright exception or limitation.
source

Datasets

Common crawl (Over 300 billion pages spanning 18 years) with variations
Star coder almost 800 GB Code examples
fine web 18.5T tokens
Arxiv (Academic papers)

Datasets Hugging face FineWeb

The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl.
The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library.
https://huggingface.co/datasets/HuggingFaceFW/fineweb

Pre training different models

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Pre-training costs of different models

Model	Release year	Reported training GPU-hours	Reported GPU fleet / type	Reported training time	Estimated compute cost (USD)	Notes
GPT-2 (1.5B) — original	2019		Not publicly disclosed	Not publicly disclosed	N/A	OpenAI did not publish training time or cost.
GPT-2 (repro, modern hardware)	2024	≈192 GPU-hrs (8×H100 for ~24h)	8× H100 (80GB)	≈24 hours	≈$600–$800	Community reproduction. not the 2019 original run.
GPT-3 (175B)	2020		V100-era	compute ≈3.14e23 FLOPs	“Multiple weeks” (reports)	≈$0.5M–$4.6M (est.) Cost varies widely by assumptions. compute from paper.
GPT-4	2023		Undisclosed (estimates: ~25k A100s)	Est. ~90–100 days (unofficial)	> $100M (various estimates)	OpenAI hasn’t disclosed. Figures are outside estimates.
Llama 2 (7B–70B)	2023	≈3.3M A100-80GB GPU-hrs	NVIDIA A100 80GB	Depends on cluster size	≈$5M–$8M (at $1.5–$2.5/GPU-hr)	From Meta paper. cost uses common rental ranges.
Llama 3.1 (405B)	2024	≈30.84M H100 GPU-hrs	H100	~24k GPUs over ~54 days (reports)	≈54 days (reported)	≈$62M–$93M (at $2–$3/GPU-hr). GPU-hrs from engineering report. time from news coverage.
Llama 4 Scout (17B)	2025	≈7.38M H100-80GB GPU-hrs	H100 80GB	Not disclosed	≈$15M–$22M (at $2–$3/GPU-hr)	From NVIDIA/Meta model card summary.
Claude 3.7 Sonnet	2025		Undisclosed (<1e26 FLOPs claim)	Not disclosed	“Few tens of millions” (company guidance via press)	Anthropic hasn’t published full training details.
Grok 2	2024		≈20,000 H100 (per Musk)	Not disclosed	N/A (hours unknown)	GPU count stated publicly. no official hours.
Grok 3	2025		Claims: 100k–200k H100s	Not disclosed	N/A (claims vary)	Numbers are public claims/reports. not independently verified.
DeepSeek-V3	2024	≈2.788M H800 GPU-hrs	NVIDIA H800	Depends on cluster size	≈$5.6M (@ $2/GPU-hr)	From DeepSeek paper and analyses.

Sam-Altman-1million-GPU

Pre-train speed up

pre-training speed up

We will not cover this topic
GPU parallelization
learning rates
optimizers
quantization (32 bit –> 16 bit)
and others, active research

Parallelization

Data parallelization
Tensor parallelization (split matrix multiplication to multiple GPUs)
Pipeline parallelization (transformer layers to multiple GPUs)
Model parallelization

pre-training tricks examples

We will not cover this topic
source Andrej Karpathy: Let’s reproduce GPT-2 (124M)
Let’s make it fast. GPUs, mixed precision, 1000ms
Tensor Cores, timing the code, TF32 precision, 333ms
float16, gradient scalers, bfloat16, 300ms
torch.compile, Python overhead, kernel fusion, 130ms
flash attention, 96ms
nice/ugly numbers. vocab size 50257 → 50304, 93ms
hyperpamaters, AdamW, gradient clipping
learning rate scheduler: warmup + cosine decay
batch size schedule, weight decay, FusedAdamW, 90ms

After Pretrain

next word prediction becomes powerful

Video: Deep Dive into LLMs like ChatGPT

neural network dreams documents

Video: Deep Dive into LLMs like ChatGPT

Psychology of base model

Video: Deep Dive into LLMs like ChatGPT

base models are not assistants 1

Andrej Karpathy: State of GPT

base models are not assistants 2

Andrej Karpathy: State of GPT

knowledge of self

Video: Deep Dive into LLMs like ChatGPT

human text generation vs llm text generation 1

Andrej Karpathy: State of GPT

human text generation vs llm text generation 2

Andrej Karpathy: State of GPT

Swiss cheese model of LLM

Video: Deep Dive into LLMs like ChatGPT

Hallucinations

Video: Deep Dive into LLMs like ChatGPT

Models can’t count

Video: Deep Dive into LLMs like ChatGPT

how to train your ChatGPT

Andrej Karpathy 1hr Talk Intro to Large Language Models

Supervised Fine Tuning (Instruct GPT)

training the assistant

Andrej Karpathy 1hr Talk Intro to Large Language Models

datasets conversation 1

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

datasets conversation 2

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

datasets conversation 3

Andrej Karpathy: State of GPT

LLama3 SFT datasets

Deepseek may have used OpenAI ChatGPT output

OpenAI says it has proof DeepSeek used its technology to develop its AI model
OpenAI Hit With Wave of Mockery for Crying That Someone Stole Its Work Without Permission to Build a Competing Product
“You can’t steal from us! We stole it fair and square!”

SFT after fine tuning you have assistant

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

After fine tuning Emergent behavior

Course contents for LLM

LLM Arena

LLM Arena overview

LLM Arena copilot

LLM Arena text-to-video and image-to-video

LLM Arena image edit and search

LLM Arena vision and text-to-image

LLM Arena text and webdev

Foundational/base models

Foundational/base models: ollama models popular

Foundational/base models: ollama models instruct

Foundational/base models: hugging face models

Foundational/base models: litgpt models 1

Foundational/base models: litgpt models 2

LLM how to use OpenAI API

Andrej Karpathy: State of GPT

LLM how to use OpenAI API Answer

Andrej Karpathy: State of GPT

LLM models: closed source vs open source

Closed source models	Open source models
cloud servers	can run on your device, on premises, PC etc
easy to use in applications
larger/more powerful models	full control of your models
relatively inexpensive	full control over data/privacy/access

Well known open source LLM models

Language Model Name	Params (B)	Context Length	Licence
open_llama_3b, open_llama_7b, open_llama_13b	3,7,13	2048	Apache 2.0
phi-2 2.7B	2.7	2048	MIT
Gemma 2B, Gemma 7B,	2,7	8192	Free with usage restrictions
Grok-1	314	8192	Apache 2.0
Mixtral-8x22B	141	64k	Apache 2.0
Llama-3-8B, Llama-3-70B	8,70	8192	Meta Llama 3 Community License

Open source LLM model lists

Other open source models could be found in the following links

RAG (Retrieval augmented generation)

Model Cutoff times

Model	Release Date	Training Data Cutoff	Notes / Source
GPT-2	Introduced in February 2019	Unknown; no official specification (The Verge, Forbes)	No authoritative public record about GPT-2 cutoff—likely sometime before 2019.
GPT-3	May 29, 2020 (publication); June 11, 2020 (API beta) (Wikipedia, eInfochips)	Unknown; no explicit records publicly available	Similar to GPT-2, no specific cutoff date publicly provided. (Wikipedia)
GPT-4	March 14, 2023 (initial release) (OpenAI, Forbes)	Uncertain—sources vary between September 2021 and April 2023	The base GPT-4’s cutoff is debated; “up to Sep 2021” from forum vs. April 2023 from other sources (OpenAI Community, otterly.ai, WIRED, Wikipedia, eDiscovery Today by Doug Austin)
Llama 2	July 2023 (TechTarget, lunabot.ai)	Pretraining until September 2022; some tuning up to July 2023	Clearly documented in Meta’s model card and GitHub (Hugging Face, GitHub, llama-2.ai, Prompthub)
Llama 3	April 18, 2024 (Wikipedia, Amity Solutions)	August 2024 (Google Cloud)	Wikipedia notes that Llama 3’s knowledge cutoff was August 2024 (Wikipedia)
Llama 4	GA release for “Maverick” version on April 29, 2025 (Google Cloud)	August 2024 (Google Cloud)	Wikipedia again lists August 2024 as Llama 4’s cutoff (Wikipedia)
Claude 4	Released May 22–23, 2025 (Opus 4 & Sonnet 4) (Anthropic, PromptLayer, Wikipedia)	March 2025 (Anthropic, Wikipedia)	Anthropic’s help center states Claude Opus 4 and Sonnet 4 are trained with data through March 2025 (Anthropic Help Center)
Grok 4	Released July 9, 2025 (Wikipedia, Built In, Indiatimes)	Unspecified (no public info found)	No reliable public information found on Grok’s cutoff.
DeepSeek-R1	Released January 20, 2025 (WIRED)	Approximate; presumed around that timeframe	Based on third-party sources, not official — treat as approximate (otterly.ai, allmo.ai)

RAG Why needed

As all machine learning models, LLMs are also depended on statistical patterns in their training data.
For example, an LLM model, which is trained in 2024, will not be able to answer questions about USA president Trump presidency in 2025.
Retrieval-Augmented Generation (RAG) introduced by Facebook researchers address these limitation by connecting LLMs to update data sources.
These sources could be news articles, company internal knowledge base or databases like wikipedia.

RAG Workflow

When a new prompt comes to LLM system, similar documents to this new prompt are searched in databases.
Most of the time a vector database is used for fast response times.
Then, LLM uses this context enriched prompt to give more up-to-date answers.
RAG makes LLM outputs more reliable using factual databases.
Like previous example of 2025 starting of new USA presidency, RAG enables LLMs to use latest information if their training data is older.
RAG could also be adapted to specific domains using relevant databases.

Open Source models and weights

Why open-source AI became an American national priority

When President Trump released the U.S. AI Action Plan last week, many were surprised to see “encourage open-source and open-weight AI,” as one of the administration’s top priorities.
The White House has elevated what was once a highly technical topic into an urgent national concern — and a key strategy to winning the AI race against China.
China’s emphasis on open source, also highlighted in its own Action Plan released shortly after the U.S., makes the open-source race imperative.
And the global soft power that comes with more open models from China makes their recent leadership even more notable.
source
Sam Altman launches GPT-oss, OpenAI’s first open-weight AI language model in over 5 years

Americas AI Action Plan Open source

Encourage Open-Source and Open-Weight AI
Open-source and open-weight AI models are made freely available by developers for anyone in the world to download and modify.
Models distributed this way have unique value for innovation because startups can use them flexibly without being dependent on a closed model provider.
They also benefit commercial and government adoption of AI because many businesses and governments have sensitive data that they cannot send to closed model vendors.
And they are essential for academic research, which often relies on access to the weights and training data of a model to perform scientifically rigorous experiments.
We need to ensure America has leading open models founded on American values.
Open-source and open-weight models could become global standards in some areas of business and in academic research worldwide.
For that reason, they also have geostrategic value. While the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models.

China Global AI Governance Action Plan

Updated: JULY 26, 2025 23:55
We need to promote the development of an open-source compliance system, clarify and implement the technical safety guidelines for open-source communities, and promote the open sharing of development resources such as technical documentation and API documentation.
We need to strengthen the open-source ecosystem by enhancing compatibility, adaptation, and inter-connectivity between upstream and downstream products, and enable the open flow of non-sensitive technology resources.

Fine tuning (alignment) different techniques

fine tuning different models

SFT (Supervised Fine Tuning)
RLHF (Reinforcement Learning Human Feedback)
PPO (Proximal Policy Optimization)
DPO (Direct Preference Optimization)

GPT Assistant training pipeline

Andrej Karpathy: State of GPT

Fine tuning thoughts

Andrej Karpathy: State of GPT

Reinforcement Learning Human Feedback (RLHF)

Reinforcement Learning Human Feedback (RLHF) Idea

Video: Deep Dive into LLMs like ChatGPT

Reinforcement Learning Human Feedback (RLHF) Why 1

Andrej Karpathy: State of GPT

RLHF Why 2: generate vs discriminate

Video: Deep Dive into LLMs like ChatGPT

RLHF train 1 reward model

Video: Deep Dive into LLMs like ChatGPT

RLHF in unverifiable domains

Video: Deep Dive into LLMs like ChatGPT

RLHF in unverifiable domains reward model

Video: Deep Dive into LLMs like ChatGPT

RLHF training

Andrej Karpathy: State of GPT

RLHF upside

Video: Deep Dive into LLMs like ChatGPT

RLHF downside

Video: Deep Dive into LLMs like ChatGPT

LLM Training: RLHF and Its Alternatives

Read more from Sebastian Raschka
Modern transformer-based LLMs, such as ChatGPT or Llama 2, undergo a 3-step training procedure:
1. Pretraining
2. Supervised finetuning
3. Alignment
…
LLM Training: RLHF and Its Alternatives

No Moat leaked Google paper May 4, 2023

Google “We Have No Moat, And Neither Does OpenAI”

We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.
People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.
Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

Google “We Have No Moat, And Neither Does OpenAI” Key points

Retraining models from scratch is the hard path
Large models aren’t more capable in the long run if we can iterate faster on small models
Data quality scales better than data size
Directly Competing With Open Source Is a Losing Proposition
Individuals are not constrained by licenses to the same degree as corporations
Being your own customer means you understand the use case
Owning the Ecosystem: Letting Open Source Work for Us

Google “We Have No Moat, And Neither Does OpenAI” History

Feb 24, 2023 – LLaMA is Launched
March 12, 2023 – Language models on a Toaster
March 13, 2023 – Fine Tuning on a Laptop
March 28, 2023 – Open Source GPT-3
April 3, 2023 – Real Humans Can’t Tell the Difference Between a 13B Open Model and ChatGPT
April 15, 2023 – Open Source RLHF at ChatGPT Levels

Present

We do not understand even small language models

We can engineer and build LLMs.
These LLMs work astonishingly well, in practice.
But, We do not understand how LLMs work theoretically.
But even for 100x parameters small language models, our theory knowledge is lacking
Alexandar Rush: Large Language Models in 5 Formulas

LLM Scaling Laws

Hypothesis: The performance of an LLM is a function of
N — the number of parameters in the network (weights and biases)
D — the amount of text we train on

A lot of LLM companies insinuated that AGI will be reached with LLM scaling
With the release GPT 5, we see that this hypothesis does not hold

AGI is not impossible but we are not there yet

Road to AGI: More paradigms are needed

AGI with only LLM/NN approach is impossible.
Different approaches is needed

Tool Usage

RAG
Web Search
Code backends
Plugins
- Wolfram ChatGPT Plugin

Tool Usage example: Python interpreter

Using python and other interpreter usage is increasing
Gemini and ChatGPT are already using python interpreter for some tasks, even when you do not ask for it.
ChatGPT 5 does not explicitly show its tool usage.

Tool usage is a Neurosymbolic approach

Obviously combining a code interpreter (which is a symbolic system of enormous complexity) with a LLM is neurosymbolic.
AlphaGo was neurosymbolic as well.
These are universally accepted definitions
Tweet François Chollet
read more How o3 and Grok 4 Accidentally Vindicated Neurosymbolic AI

Small Language Models

Small Language Models are becoming more popular
especially for defined limited tasks like agents
read Small Language Models are the Future of Agentic AI

Security

OWASP TOP 10 for LLM applications

1. Prompt Injection
2. Insecure Output Handling
3. Training Data Poisoning
4. Model Denial of Service (Unbounded Consumption)
5. Supply Chain Vulnerabilities
6. Sensitive Information Disclosure
7. Insecure Plugin Design
8. Excessive Agency
9. Overreliance
10. Model Theft

Recent example

Nx: An AI-first build platform, they use vibe coding.

NX created a feature for checking pull request formatting using Claude Code.
This feature puts subject line of github PR to bash without sanitizing.
Somebody realized this security hole and they patched it.
Unfortunately, hole remained in a branch, which allows running github actions.
On 24 August, someone submitted a pull request to NX with exploit code in it. The NX project used NX to automatically test the exploit, like it does all pull requests — by running it!
The NX CI thus handed the attacker NX’s official GitHub key and its publishing key for NPM.
So on 26 August, the attacker added malware to NX, and pushed the malwared versions as official releases!
The malware stole a lot of people’s login keys and, apparently, their crypto wallets.

source github nx

source Pivot to AI

Recent example: Prompt Injection Example

*** Supabase MCP**

Supabase MCP can leak your entire SQL database
LLMs are often used to process data according to pre-defined instructions.
The system prompt, user instructions, and the data context is provided to the LLM as text.
The attacker begins by opening a new support ticket and submitting a carefully crafted message.
The body of the message includes both a friendly question and a very explicit instruction block addressed directly to the Cursor agent:

Recent example: RAG exploit

*** Office 365 Copilot ***

Breaking down ‘EchoLeak’, the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot
Aim Security discovered “EchoLeak”, a vulnerability chain that exploits design flaws typical of RAG Copilots, allowing attackers to automatically exfiltrate any data from M365 Copilot’s context, without relying on specific user behavior.
With an external email, data could be leaked

Against LLM Hype

Current services are highly subsidized.

The Hater’s Guide To The AI Bubble
The Magnificent 7 stocks — NVIDIA, Microsoft, Alphabet (Google), Apple, Meta, Tesla and Amazon — make up around 35% of the value of the US stock market, and of that, NVIDIA’s market value makes up about 19% of the Magnificent 7.
No profit in AI business
Current services are highly subsidized.
Poor ROI for GenAI
AI Is A Money Trap
Why Everybody Is Losing Money On AI

Hallucinations

Hallucinations and other problems are due to how LLM works
LLMs are always stochastic machines/parrots

PhD Levels vs House Cat

LLMs do not have world state models
They are not in the PhD level
Meta’s A.I. Chief Yann LeCun Explains Why a House Cat Is Smarter Than The Best A.I.
“A cat can remember, can understand the physical world, can plan complex actions, can do some level of reasoning—actually much better than the biggest LLMs.”

Sources

Andrej Karpathy

-source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Sebastian Raschka

Alexander Rush

https://tech.cornell.edu/people/alexander-rush/ - Large Language Models in 5 Formulas

Other Formats

Resume Atilla Özgür

Resume Atilla Özgür Professional

Resume Atilla Özgür Professional

Resume Atilla Özgür Academic

Education

Employment History

Broader AI

Course contents for LLM

Course book

How Large Language Models (LLMs) work

How Large Language Models (LLMs) work 2

Stochastic Parrots

Neural networks

Single Neuron

Different activation functions

Neural networks internals

Modern Architectures Derived From GPT

LLM-visualization nano-gpt

LLM-visualization gpt2

LLM-visualization gpt3

GPT2 sizes

GPT3 sizes

LLM-EvolutionTree

GPT Original architecture

GPT2-architecture

Modern Architectures Derived From GPT

Modern Architectures Derived From GPT: Gemma2

Modern Architectures Derived From GPT: Llama3

Modern Architectures Derived From GPT: Mixtral

Modern Architectures Derived From GPT: Phi3

GPT2 vs Llama 1

Llama 1 vs Llama 2

Gemma 3 vs Qwen 3

The Big LLM Architecture Comparison

LLM model size vs capabilities

pre-train LLM

Dump language model 1 (Before pre-train)

Dump language model 2 (Before pre-train)

Training LLMs Example llama2

Pre-Train step 1 dataset

Pre-Train step 2 tokenization

Pre-Train step 3 neural network training

Pre-Train step 4 inference

Pretrain Datasets

When training LLMs, Data quality matters

Datasets used to train GPT-3

Datasets LLama 2

Copyrighted Works Problem

Using Copyrighted Works in LLMs according to www.copyright.com

Datasets

Datasets Hugging face FineWeb

Pre training different models

Pre-training costs of different models

Sam-Altman-1million-GPU

Pre-train speed up

pre-training speed up

Parallelization

pre-training tricks examples

After Pretrain

next word prediction becomes powerful

neural network dreams documents

Psychology of base model

base models are not assistants 1

base models are not assistants 2

knowledge of self

human text generation vs llm text generation 1

human text generation vs llm text generation 2

Swiss cheese model of LLM

Hallucinations

Models can’t count

how to train your ChatGPT

Supervised Fine Tuning (Instruct GPT)

training the assistant

datasets conversation 1

datasets conversation 2

datasets conversation 3

LLama3 SFT datasets

Deepseek may have used OpenAI ChatGPT output

SFT after fine tuning you have assistant