Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • U unicoc
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 126
    • Issues 126
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Adell Collier
  • unicoc
  • Issues
  • #56

Closed
Open
Created Feb 11, 2025 by Adell Collier@adell628893828Maintainer

DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk


DeepSeek: at this phase, the only takeaway is that open-source models surpass proprietary ones. Everything else is bothersome and I do not buy the general public numbers.

DeepSink was constructed on top of open source Meta designs (PyTorch, Llama) and ClosedAI is now in danger since its appraisal is outrageous.

To my knowledge, no public documents links DeepSeek straight to a specific "Test Time Scaling" strategy, however that's highly possible, so permit me to streamline.

Test Time Scaling is used in machine discovering to scale the design's efficiency at test time instead of during training.

That indicates fewer GPU hours and less effective chips.

In other words, lower computational requirements and lower hardware expenses.

That's why Nvidia lost practically $600 billion in market cap, the biggest one-day loss in U.S. history!

Many individuals and institutions who shorted American AI stocks became exceptionally abundant in a couple of hours since investors now forecast we will need less effective AI chips ...

Nvidia short-sellers just made a single-day revenue of $6.56 billion according to research study from S3 Partners. Nothing compared to the marketplace cap, I'm taking a look at the single-day amount. More than 6 billions in less than 12 hours is a lot in my book. Which's just for Nvidia. Short sellers of chipmaker Broadcom made more than $2 billion in earnings in a couple of hours (the US stock exchange runs from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest Over Time information shows we had the second highest level in January 2025 at $39B but this is outdated because the last record date was Jan 15, 2025 -we have to wait for the current data!

A tweet I saw 13 hours after releasing my post! Perfect summary Distilled language models

Small language designs are trained on a smaller sized scale. What makes them different isn't simply the abilities, it is how they have actually been built. A distilled language model is a smaller, more efficient model created by transferring the understanding from a larger, more intricate design like the future ChatGPT 5.

Imagine we have a teacher model (GPT5), which is a large language model: a deep neural network trained on a lot of data. Highly resource-intensive when there's minimal computational power or when you need speed.

The knowledge from this teacher model is then "distilled" into a trainee design. The trainee model is easier and has fewer parameters/layers, which makes it lighter: less memory use and computational needs.

During distillation, the trainee model is trained not only on the raw data however also on the outputs or the "soft targets" (likelihoods for each class rather than difficult labels) produced by the instructor design.

With distillation, the trainee model gains from both the original information and the detailed predictions (the "soft targets") made by the teacher model.

In other words, the trainee design does not simply gain from "soft targets" but likewise from the exact same training information used for the teacher, but with the assistance of the teacher's outputs. That's how understanding transfer is optimized: dual learning from information and from the instructor's forecasts!

Ultimately, the trainee mimics the instructor's decision-making process ... all while utilizing much less computational power!

But here's the twist as I understand it: DeepSeek didn't just extract content from a single big language model like ChatGPT 4. It depended on lots of large language models, bio.rogstecnologia.com.br including open-source ones like Meta's Llama.

So now we are distilling not one LLM but multiple LLMs. That was among the "genius" concept: mixing various architectures and datasets to develop a seriously adaptable and robust small language design!

DeepSeek: Less supervision

Another vital development: less human supervision/guidance.

The concern is: how far can models go with less human-labeled data?

R1-Zero discovered "reasoning" abilities through trial and mistake, it evolves, it has distinct "thinking behaviors" which can lead to noise, limitless repeating, demo.qkseo.in and language blending.

R1-Zero was speculative: there was no preliminary guidance from identified data.

DeepSeek-R1 is various: it used a structured training pipeline that consists of both supervised fine-tuning and reinforcement learning (RL). It began with initial fine-tuning, followed by RL to fine-tune and enhance its thinking abilities.

The end outcome? Less sound and no language mixing, unlike R1-Zero.

R1 utilizes human-like thinking patterns initially and surgiteams.com it then advances through RL. The innovation here is less human-labeled information + RL to both guide and fine-tune the design's performance.

My question is: did DeepSeek actually solve the issue knowing they extracted a great deal of data from the datasets of LLMs, which all gained from human guidance? To put it simply, is the standard reliance actually broken when they depend on formerly trained designs?

Let me show you a live real-world screenshot shared by Alexandre Blanc today. It reveals training data drawn out from other models (here, ChatGPT) that have gained from human supervision ... I am not convinced yet that the conventional reliance is broken. It is "simple" to not need huge quantities of top quality reasoning data for training when taking faster ways ...

To be balanced and reveal the research study, I've uploaded the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My issues concerning DeepSink?

Both the web and mobile apps gather your IP, keystroke patterns, and device details, and whatever is kept on servers in China.

Keystroke pattern analysis is a behavioral biometric technique used to determine and verify people based upon their distinct typing patterns.

I can hear the "But 0p3n s0urc3 ...!" remarks.

Yes, open source is great, but this thinking is restricted because it does NOT think about human psychology.

Regular users will never ever run designs in your area.

Most will just want fast responses.

Technically unsophisticated users will utilize the web and mobile variations.

Millions have actually already downloaded the mobile app on their phone.

DeekSeek's models have a real edge and that's why we see ultra-fast user adoption. For now, they are remarkable to Google's Gemini or OpenAI's ChatGPT in numerous ways. R1 scores high up on unbiased standards, no doubt about that.

I suggest searching for tandme.co.uk anything sensitive that does not align with the Party's propaganda on the internet or mobile app, king-wifi.win and the output will promote itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is lovely. I might share horrible examples of propaganda and censorship but I will not. Just do your own research. I'll end with DeepSeek's privacy policy, which you can continue reading their site. This is a basic screenshot, king-wifi.win absolutely nothing more.

Rest guaranteed, your code, ideas and conversations will never be archived! As for the real financial investments behind DeepSeek, we have no idea if they remain in the hundreds of millions or in the billions. We the $5.6 M amount the media has actually been pushing left and right is misinformation!

Assignee
Assign to
Time tracking