jg2007 10 hours ago

Cursor recently published a new blog outlining how they train models. Interestingly, the blog does not clarify how they handle opt-out user data and/or business user data -- exact phrasing: "[cursor's] model runs on every user action, handling over 400 million requests per day. As a result, we have a lot of data about which suggestions users accept and reject. This post describes how we use this data to improve Tab using online reinforcement learning."

As a matter of fact, the wording sounds like all cursor user data (opt-in and opt-out alike) are being used.

Anyone knows what's going on behind the scenes?

  • NitpickLawyer 10 hours ago

    If you read the fineprint, they all say mostly the same variation on "we do not train foundational models on your data". That is not to say they won't train other models, or use signals to train other models. It's just the data that doesn't get copied to the training set.

    And this makes sense. You train on your own data, and use the signals to know if your run was good or not.

fithisux 10 hours ago

That is why I use VScodium or Theia and Positron.

No AI features enabled.