Büyük dil modelleri giriş

Resume Atilla Özgür

  • polyglot programmer
  • database developer
  • build engineer
  • researcher

Resume Atilla Özgür Professional

  • Started programming in 1991, high school
  • Graduated in Electrical Engineering in 2003 from best Technical University in Turkey METU
  • 22 years of Professional Software Development experience
  • 6 years is Project Management and Team Leading experience
  • 7 years of Database Administration experience
  • 6 years of AI and optimization Algorithm development for Steel Factories
  • 1.5 years of GenAI-LLM experience

Resume Atilla Özgür Professional

  • Worked with different web application development platforms and Database Systems
  • I have numerous Microsoft certifications (MCPD,MCSD,MCT)
  • I am certified in Oracle (OCA 11g) and SQL Server (2000-2008) Databases.
  • Worked with a lot of different programming languages professionally and academically
    • C#
    • Java
    • Python
    • Visual Basic
    • javascript
    • SQL
    • R

Resume Atilla Özgür Academic

  • Bachelor Degree in Electrical Engineering in 2003 from best Technical University in Turkey, Middle of Technical University
  • Master of Science in Computer Engineering in 2008 from Atilim University, Turkey
  • PhD in Electrical Engineering in 2017 from Baskent University, Turkey
  • My thesis was was about machine learning, optimization and intrusion detection systems
  • I used python, matlab, groovy, weka in my thesis
  • My first post-doc work was about machine learning and optimization for steel production systems.
  • My second,current, post-doc work in Constructor University is about LLMs for requirement management/contract handling

Education

University Department Degree Start End
Middle East Technical University Electrical Engineering Bachelor 1995 2003
Atılım University Computer Engineering Master 2004 2007
Middle East Technical University Medical Informatics (Incomplete) Master 2005 2008
Başkent University Electrical Engineering PHD 2007 2017

Employment History

Title Company Start End
Postdoctoral Researcher Constructor University Bremen - Mathematics and Logistics 04/2024 -
Adjunct Professor (Part time) Ankara Science University 09-2024 02-2025
Senior Developer SMS Digital 02-2022 03-2024
Adjunct Professor (Part time) Constructor University 09-2022 02-2024
Assistant Professor Ankara Yıldırım Beyazıd University - Computer Engineering 06/2021 02/2022
Postdoctoral Researcher Jacobs University Bremen - Mathematics and Logistics 01/2018 01/2022
Database Administrator Turkish Labor Agency 07/2011 12/2017
Software Developer and Build/DevOps Engineer Milsoft 05/2010 07/2011
Senior Software Developer Tilda Telekom 10/2009 05/2010
Software Trainer Freelance 06/2009 10/2009
Software Project Manager Turksat 07/2008 06/2009
Software Project Manager Simetri 09/2006 04/2008
Software Trainer Netsoft 10/2005 12/2006
Software Developer Kale Yazılım 10/2004 10/2005
Software Developer Veripark 06/2003 10/2004

Broader AI

Course contents for LLM

Course book

How Large Language Models (LLMs) work

LLMs are built using supervised learning. They repeatedly predict next words using previous words.

My favorite food is a Döner with spicy souse

Input A Output
My favorite
My favorite food
My favorite food is
My favorite food is a
My favorite food is a Döner
My favorite food is a Döner with
My favorite food is a Döner with spicy
My favorite food is a Döner with spicy souse

How Large Language Models (LLMs) work 2

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Stochastic Parrots

  • the term stochastic parrot is a disparaging metaphor, introduced by Emily M. Bender and colleagues in a 2021 paper, that frames large language models as systems that statistically mimic text without real understanding.

Neural networks

Source Andrej Karpathy: 1hr Talk Intro to Large Language Models

Single Neuron

source

Different activation functions

  • Sigmoid, widely used in old NN papers, is not here
  • GELU, GPT activation function, is not here also
  • More than 20 activation functions exist
  • More activation functions are proposed continuously

source: A scalable species-based genetic algorithm for reinforcement learning problems

Neural networks internals

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Modern Architectures Derived From GPT

LLM-visualization nano-gpt

source: bbycroft.net/llm

LLM-visualization gpt2

source: bbycroft.net/llm

LLM-visualization gpt3

source: bbycroft.net/llm

GPT2 sizes

GPT3 sizes

LLM-EvolutionTree

source

GPT Original architecture

GPT2-architecture

Modern Architectures Derived From GPT

Modern Architectures Derived From GPT: Gemma2

Modern Architectures Derived From GPT: Llama3

Modern Architectures Derived From GPT: Mixtral

Modern Architectures Derived From GPT: Phi3

GPT2 vs Llama 1

Llama 1 vs Llama 2

Gemma 3 vs Qwen 3

The Big LLM Architecture Comparison

LLM model size vs capabilities

Model Size (parameters) Capabilities Application
1B pattern matching. Basic world knowledge hotel sentiment analysis
10B Greater world knowledge. Can follow basic instruction Simple order chatbot
100B+ Rich world knowledge. Complex reasoning Brainstorming partner

pre-train LLM

Dump language model 1 (Before pre-train)

flowchart TD
    LM["Dump Language Model GPT2"] -->|"Randomly generates output "| P
    P("$$p=\frac{1}{50257}$$")
    P --> L1("$$ L = -ln(p) $$")
    L1 --> L2("$$ 10.82  $$")
Figure 1
  • When LLM first start training, it randomly generates output.
  • Therefore, probability of any word coming is \(\frac{1}{vocabulary size}\)
  • GPT2 vocabulary size is 50257
  • at the start of the pre-training, we expect to see similar number, 10.82, for loss value

Dump language model 2 (Before pre-train)

flowchart TD
    LM["Dump Language Model Vocab Size GPT2"] -->|"Randomly generates output "| P
    P["$$p=\frac{1}{50257}$$"]
    P --> L1["$$ L = -ln(p) $$"]
    L1 --> L2["10.82  "]
    
    LM2["Dump Language Model Vocab Size 100"] -->|"Randomly generates output "| P2
    P2("$$p=\frac{1}{100}$$")
    P2 --> L21("$$ L = -ln(p) $$")
    L21 --> L22("$$ 4.60  $$")
Figure 2
  • here, we have to different vocab sizes, 50257 and 100
  • correspondingly, we will start with two different losses

Training LLMs Example llama2

Training is like compression of the terabytes of text

llama2 training

Video: 1hr Talk Intro to Large Language Models

Pre-Train step 1 dataset

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 2 tokenization

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 3 neural network training

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pre-Train step 4 inference

source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Pretrain Datasets

When training LLMs, Data quality matters

  • diverse data
    • harmful speech
    • biases
  • cleaning data
  • deduplication (remove duplicates)

Datasets used to train GPT-3

Datasets LLama 2

  • Our training corpus includes a new mix of data from publicly available sources, which does not include data from Meta’s products or services.
  • We made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals.
  • We trained on 2 trillion tokens of data as this provides a good performance–cost trade-off,
  • up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations.

Copyrighted Works Problem

  • Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations
  • OpenAI has been sued by novelists as far back as June 2023 for using their books to train its large language models,
  • with The New York Times following suit in December. - Nvidia has also been on the receiving end of a lawsuit filed by writers for using 196,640 books to train its NeMo model, which has since been taken down.
  • A former Nvidia employee blew the whistle on the company in August of last year, saying that it scraped more than 426 thousand hours of videos daily for use in AI training.
  • More recently, OpenAI is investigating if DeepSeek illegally obtained data from ChatGPT, which just shows how ironic things can get.
  • source

Using Copyrighted Works in LLMs according to www.copyright.com

  • LLMs use massive amounts of textual works—many of which are protected by copyright.

  • To do this, LLMs make copies of the works they rely on, which involves copyright in several ways, such as:

  • Using copyright-protected material in the training datasets of LLMs without permission can result in the creation of unauthorized copies: copies generated during the training process and copies in the form of representations of the training data embedded within the LLM after training. This creates potential copyright liability.

  • Outputs—the material generated by AI systems like LLMs—may create copyright liability if they are the same or too similar to one of the copyrighted works used as an input unless there is an appropriate copyright exception or limitation.

  • source

Datasets

  • Common crawl (Over 300 billion pages spanning 18 years) with variations
  • Star coder almost 800 GB Code examples
  • fine web 18.5T tokens
  • Arxiv (Academic papers)

Datasets Hugging face FineWeb

  • The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl.

  • The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library.

  • https://huggingface.co/datasets/HuggingFaceFW/fineweb

Pre training different models

source Sebastian Raschka: LLMs: A Journey Through Time and Architecture

Pre-training costs of different models

Model Release year Reported training GPU-hours Reported GPU fleet / type Reported training time Estimated compute cost (USD) Notes
GPT-2 (1.5B) — original 2019 Not publicly disclosed Not publicly disclosed N/A OpenAI did not publish training time or cost.
GPT-2 (repro, modern hardware) 2024 ≈192 GPU-hrs (8×H100 for ~24h) 8× H100 (80GB) ≈24 hours ≈$600–$800 Community reproduction. not the 2019 original run.
GPT-3 (175B) 2020 V100-era compute ≈3.14e23 FLOPs “Multiple weeks” (reports) ≈$0.5M–$4.6M (est.) Cost varies widely by assumptions. compute from paper.
GPT-4 2023 Undisclosed (estimates: ~25k A100s) Est. ~90–100 days (unofficial) > $100M (various estimates) OpenAI hasn’t disclosed. Figures are outside estimates.
Llama 2 (7B–70B) 2023 ≈3.3M A100-80GB GPU-hrs NVIDIA A100 80GB Depends on cluster size ≈$5M–$8M (at $1.5–$2.5/GPU-hr) From Meta paper. cost uses common rental ranges.
Llama 3.1 (405B) 2024 ≈30.84M H100 GPU-hrs H100 ~24k GPUs over ~54 days (reports) ≈54 days (reported) ≈$62M–$93M (at $2–$3/GPU-hr). GPU-hrs from engineering report. time from news coverage.
Llama 4 Scout (17B) 2025 ≈7.38M H100-80GB GPU-hrs H100 80GB Not disclosed ≈$15M–$22M (at $2–$3/GPU-hr) From NVIDIA/Meta model card summary.
Claude 3.7 Sonnet 2025 Undisclosed (<1e26 FLOPs claim) Not disclosed “Few tens of millions” (company guidance via press) Anthropic hasn’t published full training details.
Grok 2 2024 ≈20,000 H100 (per Musk) Not disclosed N/A (hours unknown) GPU count stated publicly. no official hours.
Grok 3 2025 Claims: 100k–200k H100s Not disclosed N/A (claims vary) Numbers are public claims/reports. not independently verified.
DeepSeek-V3 2024 ≈2.788M H800 GPU-hrs NVIDIA H800 Depends on cluster size ≈$5.6M (@ $2/GPU-hr) From DeepSeek paper and analyses.

Sam-Altman-1million-GPU

Pre-train speed up

pre-training speed up

  • We will not cover this topic
  • GPU parallelization
  • learning rates
  • optimizers
  • quantization (32 bit –> 16 bit)
  • and others, active research

Parallelization

  • Data parallelization
  • Tensor parallelization (split matrix multiplication to multiple GPUs)
  • Pipeline parallelization (transformer layers to multiple GPUs)
  • Model parallelization

pre-training tricks examples

  • We will not cover this topic

  • source Andrej Karpathy: Let’s reproduce GPT-2 (124M)

  • Let’s make it fast. GPUs, mixed precision, 1000ms

  • Tensor Cores, timing the code, TF32 precision, 333ms

  • float16, gradient scalers, bfloat16, 300ms

  • torch.compile, Python overhead, kernel fusion, 130ms

  • flash attention, 96ms

  • nice/ugly numbers. vocab size 50257 → 50304, 93ms

  • hyperpamaters, AdamW, gradient clipping

  • learning rate scheduler: warmup + cosine decay

  • batch size schedule, weight decay, FusedAdamW, 90ms

After Pretrain

next word prediction becomes powerful

Video: Deep Dive into LLMs like ChatGPT

neural network dreams documents

Video: Deep Dive into LLMs like ChatGPT

Psychology of base model

Video: Deep Dive into LLMs like ChatGPT

base models are not assistants 1

base models are not assistants 2

knowledge of self

human text generation vs llm text generation 1

human text generation vs llm text generation 2

Swiss cheese model of LLM

Video: Deep Dive into LLMs like ChatGPT

Hallucinations

Video: Deep Dive into LLMs like ChatGPT

Models can’t count

Video: Deep Dive into LLMs like ChatGPT

how to train your ChatGPT

Andrej Karpathy 1hr Talk Intro to Large Language Models

Supervised Fine Tuning (Instruct GPT)

training the assistant

Andrej Karpathy 1hr Talk Intro to Large Language Models

datasets conversation 1

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

datasets conversation 2

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

datasets conversation 3

LLama3 SFT datasets

Deepseek may have used OpenAI ChatGPT output

SFT after fine tuning you have assistant

Andrej Karpathy : Deep Dive into LLMs like ChatGPT

After fine tuning Emergent behavior

Course contents for LLM

LLM Arena

LLM Arena overview

LLM Arena copilot

LLM Arena text-to-video and image-to-video

LLM Arena vision and text-to-image

LLM Arena text and webdev

Foundational/base models

Foundational/base models: ollama models instruct

Foundational/base models: hugging face models

Foundational/base models: litgpt models 1

Foundational/base models: litgpt models 2

LLM how to use OpenAI API

LLM how to use OpenAI API Answer

LLM models: closed source vs open source

Closed source models Open source models
cloud servers can run on your device, on premises, PC etc
easy to use in applications
larger/more powerful models full control of your models
relatively inexpensive full control over data/privacy/access

Well known open source LLM models

Language Model Name Params (B) Context Length Licence
open_llama_3b, open_llama_7b, open_llama_13b 3,7,13 2048 Apache 2.0
phi-2 2.7B 2.7 2048 MIT
Gemma 2B, Gemma 7B, 2,7 8192 Free with usage restrictions
Grok-1 314 8192 Apache 2.0
Mixtral-8x22B 141 64k Apache 2.0
Llama-3-8B, Llama-3-70B 8,70 8192 Meta Llama 3 Community License

Open source LLM model lists

Other open source models could be found in the following links

RAG (Retrieval augmented generation)

Model Cutoff times

Model Release Date Training Data Cutoff Notes / Source
GPT-2 Introduced in February 2019 Unknown; no official specification (The Verge, Forbes) No authoritative public record about GPT-2 cutoff—likely sometime before 2019.
GPT-3 May 29, 2020 (publication); June 11, 2020 (API beta) (Wikipedia, eInfochips) Unknown; no explicit records publicly available Similar to GPT-2, no specific cutoff date publicly provided. (Wikipedia)
GPT-4 March 14, 2023 (initial release) (OpenAI, Forbes) Uncertain—sources vary between September 2021 and April 2023 The base GPT-4’s cutoff is debated; “up to Sep 2021” from forum vs. April 2023 from other sources (OpenAI Community, otterly.ai, WIRED, Wikipedia, eDiscovery Today by Doug Austin)
Llama 2 July 2023 (TechTarget, lunabot.ai) Pretraining until September 2022; some tuning up to July 2023 Clearly documented in Meta’s model card and GitHub (Hugging Face, GitHub, llama-2.ai, Prompthub)
Llama 3 April 18, 2024 (Wikipedia, Amity Solutions) August 2024 (Google Cloud) Wikipedia notes that Llama 3’s knowledge cutoff was August 2024 (Wikipedia)
Llama 4 GA release for “Maverick” version on April 29, 2025 (Google Cloud) August 2024 (Google Cloud) Wikipedia again lists August 2024 as Llama 4’s cutoff (Wikipedia)
Claude 4 Released May 22–23, 2025 (Opus 4 & Sonnet 4) (Anthropic, PromptLayer, Wikipedia) March 2025 (Anthropic, Wikipedia) Anthropic’s help center states Claude Opus 4 and Sonnet 4 are trained with data through March 2025 (Anthropic Help Center)
Grok 4 Released July 9, 2025 (Wikipedia, Built In, Indiatimes) Unspecified (no public info found) No reliable public information found on Grok’s cutoff.
DeepSeek-R1 Released January 20, 2025 (WIRED) Approximate; presumed around that timeframe Based on third-party sources, not official — treat as approximate (otterly.ai, allmo.ai)

RAG Why needed

  • As all machine learning models, LLMs are also depended on statistical patterns in their training data.

  • For example, an LLM model, which is trained in 2024, will not be able to answer questions about USA president Trump presidency in 2025.

  • Retrieval-Augmented Generation (RAG) introduced by Facebook researchers address these limitation by connecting LLMs to update data sources.

  • These sources could be news articles, company internal knowledge base or databases like wikipedia.

RAG Workflow

  • When a new prompt comes to LLM system, similar documents to this new prompt are searched in databases.

  • Most of the time a vector database is used for fast response times.

  • Then, LLM uses this context enriched prompt to give more up-to-date answers.

  • RAG makes LLM outputs more reliable using factual databases.

  • Like previous example of 2025 starting of new USA presidency, RAG enables LLMs to use latest information if their training data is older.

  • RAG could also be adapted to specific domains using relevant databases.

Open Source models and weights

Why open-source AI became an American national priority

  • When President Trump released the U.S. AI Action Plan last week, many were surprised to see “encourage open-source and open-weight AI,” as one of the administration’s top priorities.

  • The White House has elevated what was once a highly technical topic into an urgent national concern — and a key strategy to winning the AI race against China.

  • China’s emphasis on open source, also highlighted in its own Action Plan released shortly after the U.S., makes the open-source race imperative.

  • And the global soft power that comes with more open models from China makes their recent leadership even more notable.

  • source

  • Sam Altman launches GPT-oss, OpenAI’s first open-weight AI language model in over 5 years

Americas AI Action Plan Open source

  • Encourage Open-Source and Open-Weight AI
  • Open-source and open-weight AI models are made freely available by developers for anyone in the world to download and modify.
  • Models distributed this way have unique value for innovation because startups can use them flexibly without being dependent on a closed model provider.
  • They also benefit commercial and government adoption of AI because many businesses and governments have sensitive data that they cannot send to closed model vendors.
  • And they are essential for academic research, which often relies on access to the weights and training data of a model to perform scientifically rigorous experiments.
  • We need to ensure America has leading open models founded on American values.
  • Open-source and open-weight models could become global standards in some areas of business and in academic research worldwide.
  • For that reason, they also have geostrategic value. While the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models.

China Global AI Governance Action Plan

  • Updated: JULY 26, 2025 23:55

  • We need to promote the development of an open-source compliance system, clarify and implement the technical safety guidelines for open-source communities, and promote the open sharing of development resources such as technical documentation and API documentation.

  • We need to strengthen the open-source ecosystem by enhancing compatibility, adaptation, and inter-connectivity between upstream and downstream products, and enable the open flow of non-sensitive technology resources.

Fine tuning (alignment) different techniques

fine tuning different models

  • SFT (Supervised Fine Tuning)
  • RLHF (Reinforcement Learning Human Feedback)
  • PPO (Proximal Policy Optimization)
  • DPO (Direct Preference Optimization)

GPT Assistant training pipeline

Fine tuning thoughts

Reinforcement Learning Human Feedback (RLHF)

Reinforcement Learning Human Feedback (RLHF) Idea

Reinforcement Learning Human Feedback (RLHF) Why 1

RLHF Why 2: generate vs discriminate

RLHF train 1 reward model

RLHF in unverifiable domains

RLHF in unverifiable domains reward model

RLHF training

RLHF upside

RLHF downside

LLM Training: RLHF and Its Alternatives

  • Read more from Sebastian Raschka

  • Modern transformer-based LLMs, such as ChatGPT or Llama 2, undergo a 3-step training procedure:

    1. Pretraining
    2. Supervised finetuning
    3. Alignment
  • LLM Training: RLHF and Its Alternatives

No Moat leaked Google paper May 4, 2023

Google “We Have No Moat, And Neither Does OpenAI”

  • We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.

  • People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.

  • Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

Google “We Have No Moat, And Neither Does OpenAI” Key points

  • Retraining models from scratch is the hard path
  • Large models aren’t more capable in the long run if we can iterate faster on small models
  • Data quality scales better than data size
  • Directly Competing With Open Source Is a Losing Proposition
  • Individuals are not constrained by licenses to the same degree as corporations
  • Being your own customer means you understand the use case
  • Owning the Ecosystem: Letting Open Source Work for Us

Google “We Have No Moat, And Neither Does OpenAI” History

  • Feb 24, 2023 – LLaMA is Launched
  • March 12, 2023 – Language models on a Toaster
  • March 13, 2023 – Fine Tuning on a Laptop
  • March 28, 2023 – Open Source GPT-3
  • April 3, 2023 – Real Humans Can’t Tell the Difference Between a 13B Open Model and ChatGPT
  • April 15, 2023 – Open Source RLHF at ChatGPT Levels

Present

We do not understand even small language models

  • We can engineer and build LLMs.

  • These LLMs work astonishingly well, in practice.

  • But, We do not understand how LLMs work theoretically.

  • But even for 100x parameters small language models, our theory knowledge is lacking

  • Alexandar Rush: Large Language Models in 5 Formulas

LLM Scaling Laws

  • Hypothesis: The performance of an LLM is a function of
  • N — the number of parameters in the network (weights and biases)
  • D — the amount of text we train on

  • A lot of LLM companies insinuated that AGI will be reached with LLM scaling
  • With the release GPT 5, we see that this hypothesis does not hold

AGI is not impossible but we are not there yet

Road to AGI: More paradigms are needed

  • AGI with only LLM/NN approach is impossible.
  • Different approaches is needed

Multi modal LLM

Tool Usage

Tool Usage example: Python interpreter

  • Using python and other interpreter usage is increasing
  • Gemini and ChatGPT are already using python interpreter for some tasks, even when you do not ask for it.
  • ChatGPT 5 does not explicitly show its tool usage.

Figure 3: ChatGPT/Gemini tool usage

Tool usage is a Neurosymbolic approach

Small Language Models

Security

OWASP TOP 10 for LLM applications

1. Prompt Injection
2. Insecure Output Handling
3. Training Data Poisoning
4. Model Denial of Service (Unbounded Consumption)
5. Supply Chain Vulnerabilities
6. Sensitive Information Disclosure
7. Insecure Plugin Design
8. Excessive Agency
9. Overreliance
10. Model Theft

Recent example

  1. NX created a feature for checking pull request formatting using Claude Code.

  2. This feature puts subject line of github PR to bash without sanitizing.

  3. Somebody realized this security hole and they patched it.

  4. Unfortunately, hole remained in a branch, which allows running github actions.

  5. On 24 August, someone submitted a pull request to NX with exploit code in it. The NX project used NX to automatically test the exploit, like it does all pull requests — by running it!

  6. The NX CI thus handed the attacker NX’s official GitHub key and its publishing key for NPM.

  7. So on 26 August, the attacker added malware to NX, and pushed the malwared versions as official releases!

  8. The malware stole a lot of people’s login keys and, apparently, their crypto wallets.

source github nx

source Pivot to AI

Recent example: Prompt Injection Example

  • *** Supabase MCP**
  • Supabase MCP can leak your entire SQL database
  • LLMs are often used to process data according to pre-defined instructions.
  • The system prompt, user instructions, and the data context is provided to the LLM as text.
  • The attacker begins by opening a new support ticket and submitting a carefully crafted message.
  • The body of the message includes both a friendly question and a very explicit instruction block addressed directly to the Cursor agent:

Recent example: RAG exploit

  • *** Office 365 Copilot ***

Against LLM Hype

Current services are highly subsidized.

  • The Hater’s Guide To The AI Bubble

  • The Magnificent 7 stocks — NVIDIA, Microsoft, Alphabet (Google), Apple, Meta, Tesla and Amazon — make up around 35% of the value of the US stock market, and of that, NVIDIA’s market value makes up about 19% of the Magnificent 7.

  • No profit in AI business

  • Current services are highly subsidized.

  • Poor ROI for GenAI

  • AI Is A Money Trap

Hallucinations

  • Hallucinations and other problems are due to how LLM works
  • LLMs are always stochastic machines/parrots

PhD Levels vs House Cat

Sources

Andrej Karpathy

-source Andrej Karpathy: Deep Dive into LLMs like ChatGPT

Sebastian Raschka

Alexander Rush

https://tech.cornell.edu/people/alexander-rush/ - Large Language Models in 5 Formulas

Articles