I’m looking at people running Deepseek’s 671B R1 locally using NVME drives at 2 tokens a second. So why not skip the FLASH and give me a 1TB drive using a NVME controller and only DRAM behind it? The ultimate transistor count is lower on the silicon side. It would be slower than typical system memory but the same as any NVME with a DRAM cache. The controller architecture for safe page writes in Flash and the internal boost circuitry for pulsing each page is around the same complexity as the memory management unit and constant refresh of DRAM and its power stability requirements. Heck, DDR5 and up is putting power regulation on the system memory already IIRC. Anyone know why this is not a thing or where to get this thing?

  • Shadow@lemmy.ca
    link
    fedilink
    arrow-up
    12
    ·
    11 months ago

    Modern flash is already faster than your pci bus, and it’s cheaper than dram. Using ram doesn’t add anything.

    It uses to be a thing before modern flash chips, you’d have battery backed dram pci-e cards.

    • MHLoppy@fedia.io
      link
      fedilink
      arrow-up
      3
      ·
      11 months ago

      Using ram doesn’t add anything.

      It would improve access latency vs flash though, despite less difference in raw bandwidth

    • 𞋴𝛂𝛋𝛆@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      It adds an additional memory controller on a different bus and infinite read/write cycling to load much larger AI models.

      • Shadow@lemmy.ca
        link
        fedilink
        arrow-up
        3
        ·
        11 months ago

        Memory connected via the pci bus to the CPU, would be too slow for application use like that.

        Apple had to use soldered in ram for their unified memory because the length of the traces on the mobo news to be so tightly controlled. Pci is way too slow comparatively.

        • 𞋴𝛂𝛋𝛆@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          Not at all. An NVME already works as I clearly stated in the post. The speed is irrelevant with very large models. They are MoEs so they get loaded and moved around in large blocks once per inference. The only issue is cycling a NVME. They will still work, it would just be nice to not worry about the limited cycle life. I am setting up agentic toolsets where models will get loaded and offloaded a lot. I do this regularly with 40-50GB models already. I want to double or quadruple this amount.

    • 𞋴𝛂𝛋𝛆@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      I’m talking about volatile memory instead of persistent storage. I think we’re on different pages, no pun intended - lie

      • UberKitten@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        11 months ago

        a ram drive is volatile memory. you can get higher performance out of DRAM chips attached to the CPU memory controller, versus putting them behind the PCIe bus using NVME. for applications that only work with file systems, a RAM drive works around that limitation. so, why bother?

      • e0qdk@reddthat.com
        link
        fedilink
        arrow-up
        2
        ·
        11 months ago

        What they mean is you can just do something like mount -t tmpfs -o size=1G tmpfs /mnt/ramdisk to get a file system using regular RAM already.

        • solrize@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          11 months ago

          There are few motherboards with enough dram channels to handle a TB of dram. That’s basically why ram drives existed back in the day, and they are still potentially sort of relevant for the same reason. No a TB of ram wouldn’t fit on an m.2 card, but a box of ram with a pcie connector is still an interesting idea. Optane also aimed for this, but it never got enough traction to be viable, so it was scrapped.

          • e0qdk@reddthat.com
            link
            fedilink
            arrow-up
            1
            ·
            11 months ago

            You can get motherboards with enough slots if you’re willing to pay enterprise prices for them. I have a system with 1TB of RAM at work that I use as a fast data cache. I just mount tmpfs on it, write hundreds of gigs of data into it (overwriting it all every few hours), and it works great. Cost was somewhere in the $10~15K (US) range a few years ago, IIRC. Steep for an individual, sure, but not crazy for an organization.

            • solrize@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              11 months ago

              Last I saw you had to use super premium ultra dense memory modules to get 1tb into a motherboard. Maybe that’s less so now. But the hope would be to use commodity ram and CPUs etc. 10k for a 1tb system is pretty good. Last I looked it was a lot more.

              • e0qdk@reddthat.com
                link
                fedilink
                arrow-up
                1
                ·
                11 months ago

                Pretty sure you can do much better than that now (plus or minus tariff insanity) – quick check on Amazon, NewEgg, etc. suggests ballpark of $5K for 1TB RAM (Registered DDR4) + probably compatible motherboard + CPU.

        • 4am@lemm.ee
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          11 months ago

          I think OP is looking for fast, temporary storage; like a buffer or a ramdisk.

          But, PCIe bus is slower than the memory bus. Just use actual RAM for this.

        • Dudewitbow@lemmy.zip
          link
          fedilink
          arrow-up
          2
          ·
          11 months ago

          OP is basically looking for the hypothetical endgame intel optane thats almost as fast as ram. he doesnt however know that the optane project is functionally dead.

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      because ram drives are easy to accomplish the same thing in software for applications that need it

      I don’t know if this is what OP is going for, but I’ve wanted to do something similar to what he’s talking about myself to exceed the maximum amount of memory that a motherboard supported. Basically, I wanted to stick more memory on a system – and I was fine with access to it being slower than to the on-motherboard memory – to act as a very large read cache.

      A RAM drive will let you use memory that your motherboard supports as a drive. But it won’t let you stick even more physical DRAM into a system, above-and-beyond what the motherboard can handle.

        • 4am@lemm.ee
          link
          fedilink
          arrow-up
          1
          ·
          11 months ago

          Hmmm nVME as a swap partition would be performant as fuck

        • tal@lemmy.today
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          11 months ago

          No, I’m not. I’m talking about providing access to more physical DRAM than a motherboard is capable of handling.

          Swap would involve taking infreqently-used stuff in a still-bounded amount of DRAM and sticking it in slower, cheaper nonvolatile storage.

      • 𞋴𝛂𝛋𝛆@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        Yes it is like a swap, if a person can only think in terms of the software side. DRAM can be changed at any place in memory. For Flash, there is a minimum page size. All bits in a Flash page, like 1k bits, can be set high or low, depending on the architecture. Then these bits can be changed to the opposite charge. If any bit on the page needs to be changed to the original value, it requires resetting the entire page. In reality, it takes swapping pages around to ensure data integrity if power is lost. Every time the page is changed like this, it reduces some of the page’s potential to reset to the default state. If I recall correctly (unlikely) each bit is a combo of 3 transistors and a capacitor like charge storage location.

        DRAM is a single special transistor with a built in capacitor. The main problem with DRAM is that it only lasts around 100-200 microseconds. So all DRAM is constantly getting refreshed and moved around several times a second. The volatile memory type that makes sense in simple terms is static RAM. This is what the CPU cache is made from. It is 6 transistors per bit and each bit is like an on off switch with no caveat nonsense so long as the thing is powered.

        The addressing scheme is simply serial versus parallel. So loading 512 bit wide AVX words is going to be slower than with the address bus of the primary memory controller. Still, no one is complaining about their NVME speed here. The microcontroller for the NVME would be nearly capable of handling DRAM. The main difference is that Flash page refreshing takes tens of volts and a small negative power rail, while DRAM needs very stable power because there is very little voltage difference between a one and zero. The harder part is how to manage all of the memory address spaces and any bad blocks of bits. This is what the NVME is already doing.

  • CitricBase@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    11 months ago

    You are asking for 1TB of RAM. Keying it to M.2 wouldn’t make it any cheaper or better than keying it to regular DDR5. I don’t think that even just a tenth of that would physically fit onto an NVMe drive, even if someone wanted it to.

    Put in that context, do you begin to see now why that isn’t a thing that exists?

    • 𞋴𝛂𝛋𝛆@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      DRAM is 1 transistor per bit. Flash is 3 in a larger overall footprint, but these are deposited in multiple stacked layers. The actual dies are not much different if the packaging is removed. Most decent quality NVME drives have a large amount of DRAM cache already onboard. It is trivial to make the entire drive DRAM. This is a product that should exist for the niche of AI models and it probably does, but I’m unable to find it in the modern dystopian internet.

  • moody@lemmings.world
    link
    fedilink
    arrow-up
    2
    ·
    11 months ago

    M.2, not to my knowledge, but I know they make PCIE cards with RAM slots on them. LTT had a video on that some years ago, I imagine newer models have probably been made since then.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    11 months ago

    I don’t know. I have, in the past, been vaguely interested in getting a physical RAM drive to throw more memory on a system than a motherboard supports for use as a read cache for slower storage, and I only ran into things from years back.

    My guess is that there just isn’t enough stuff out there where one really benefits all that much from that much more memory, so not a huge amount of demand. It might also be that the idea is to segregate desktop computing from some other environments, and Intel and similar want to put a cap on how much memory is accessible to desktop systems so that they can charge higher prices for specialized systems. I don’t know how they’d prevent someone from throwing a ton of DRAM chips in a drive, but I can imagine them having an incentive for it.