Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pecanchoice
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Adrienne Angles
  • pecanchoice
  • Issues
  • #35

Closed
Open
Created Feb 14, 2025 by Adrienne Angles@adrienneanglesMaintainer

DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk


DeepSeek: at this phase, the only takeaway is that open-source models go beyond exclusive ones. Everything else is troublesome and links.gtanet.com.br I do not purchase the public numbers.

DeepSink was built on top of open source Meta designs (PyTorch, Llama) and ClosedAI is now in danger since its appraisal is outrageous.

To my knowledge, no public documentation links DeepSeek straight to a particular "Test Time Scaling" technique, but that's extremely possible, so allow me to simplify.

Test Time Scaling is used in device learning to scale the model's performance at test time instead of during training.

That implies fewer GPU hours and less powerful chips.

Simply put, lower computational requirements and larsaluarna.se lower hardware expenses.

That's why Nvidia lost practically $600 billion in market cap, the biggest one-day loss in U.S. history!

Many people and institutions who shorted American AI stocks ended up being extremely abundant in a couple of hours because investors now predict we will need less powerful AI chips ...

Nvidia short-sellers just made a single-day earnings of $6.56 billion according to research study from S3 Partners. Nothing compared to the marketplace cap, I'm taking a look at the single-day quantity. More than 6 billions in less than 12 hours is a lot in my book. Which's just for Nvidia. Short sellers of chipmaker Broadcom earned more than $2 billion in profits in a couple of hours (the US stock exchange operates from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest Gradually data programs we had the second highest level in January 2025 at $39B however this is dated due to the fact that the last record date was Jan 15, 2025 -we have to wait for the most recent information!

A tweet I saw 13 hours after publishing my article! Perfect summary Distilled language designs

Small language designs are trained on a smaller sized scale. What makes them different isn't simply the capabilities, it is how they have actually been developed. A distilled language model is a smaller sized, more effective model created by transferring the knowledge from a bigger, more complicated model like the future ChatGPT 5.

Imagine we have a teacher design (GPT5), which is a large language model: a deep neural network trained on a lot of data. Highly resource-intensive when there's restricted computational power or when you need speed.

The understanding from this teacher model is then "distilled" into a trainee model. The trainee design is easier and has less parameters/layers, that makes it lighter: less memory use and computational needs.

During distillation, the trainee design is trained not only on the raw data but likewise on the outputs or the "soft targets" (probabilities for shiapedia.1god.org each class rather than tough labels) produced by the teacher design.

With distillation, the trainee model gains from both the original data and vmeste-so-vsemi.ru the detailed forecasts (the "soft targets") made by the instructor design.

In other words, the trainee model does not simply gain from "soft targets" however likewise from the exact same training data used for the instructor, however with the guidance of the instructor's outputs. That's how knowledge transfer is optimized: double learning from data and from the teacher's predictions!

Ultimately, the trainee imitates the instructor's decision-making procedure ... all while utilizing much less computational power!

But here's the twist as I comprehend it: DeepSeek didn't simply extract content from a single big language design like ChatGPT 4. It relied on lots of big language models, consisting of open-source ones like Meta's Llama.

So now we are distilling not one LLM but numerous LLMs. That was one of the "genius" idea: blending various architectures and datasets to produce a seriously adaptable and robust small language model!

DeepSeek: shiapedia.1god.org Less supervision

Another essential development: less human supervision/.

The question is: how far can designs choose less human-labeled information?

R1-Zero found out "thinking" capabilities through experimentation, it develops, it has special "thinking behaviors" which can result in sound, unlimited repeating, and language blending.

R1-Zero was speculative: there was no preliminary assistance from identified information.

DeepSeek-R1 is different: it utilized a structured training pipeline that includes both supervised fine-tuning and support knowing (RL). It began with initial fine-tuning, followed by RL to fine-tune and visualchemy.gallery improve its reasoning abilities.

Completion outcome? Less sound and no language blending, unlike R1-Zero.

R1 uses human-like thinking patterns initially and it then advances through RL. The development here is less human-labeled data + RL to both guide and improve the design's efficiency.

My question is: did DeepSeek truly fix the issue understanding they drew out a lot of information from the datasets of LLMs, which all gained from human supervision? To put it simply, is the conventional dependency actually broken when they depend on formerly trained models?

Let me reveal you a live real-world screenshot shared by Alexandre Blanc today. It reveals training information drawn out from other designs (here, ChatGPT) that have gained from human guidance ... I am not persuaded yet that the conventional reliance is broken. It is "easy" to not need massive quantities of top quality reasoning information for training when taking shortcuts ...

To be well balanced and show the research study, I have actually submitted the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My issues concerning DeepSink?

Both the web and mobile apps collect your IP, keystroke patterns, and device details, and everything is saved on servers in China.

Keystroke pattern analysis is a behavioral biometric approach used to identify and validate individuals based upon their distinct typing patterns.

I can hear the "But 0p3n s0urc3 ...!" remarks.

Yes, open source is fantastic, but this thinking is restricted due to the fact that it does NOT think about human psychology.

Regular users will never ever run designs in your area.

Most will merely want fast responses.

Technically unsophisticated users will use the web and mobile variations.

Millions have actually already downloaded the mobile app on their phone.

DeekSeek's designs have a genuine edge and that's why we see ultra-fast user adoption. For now, they transcend to Google's Gemini or OpenAI's ChatGPT in lots of methods. R1 ratings high up on objective standards, no doubt about that.

I recommend looking for anything delicate that does not align with the Party's propaganda on the web or sitiosecuador.com mobile app, and the output will promote itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is lovely. I might share terrible examples of propaganda and censorship but I will not. Just do your own research. I'll end with DeepSeek's personal privacy policy, which you can continue reading their site. This is a simple screenshot, absolutely nothing more.

Feel confident, your code, ideas and discussions will never ever be archived! When it comes to the real investments behind DeepSeek, we have no idea if they remain in the numerous millions or in the billions. We simply know the $5.6 M quantity the media has been pushing left and right is misinformation!

Assignee
Assign to
Time tracking