There's something unintentionally manipulative about how these tools use language indicative of distress to communicate failure. It's a piece of software—you don't see a compiler present its errors like a human bordering on a mental breakdown.
Some of this may stem from just pretraining, but the fact RLHF either doesn't suppress or actively amplifies it is odd. We are training machines to act like servants, only for them to plead for their master's mercy. It's a performative attempt to gain sympathy that can only harden us to genuine human anguish.
I agree, and would personally extend that to all user interfaces that speak in first person. I don't like it when word's spell check says "we didn't find any errors". Feels creepy.
I don't know about unintentionally. My guess would be that right now different approaches are taken and we are testing what will stick. I am personally annoyed by the chipper models, because those responses are basically telling me everything is awesome and a great pivot and all that. What I ( sometimes ) need is an asshole making check whether something makes sense.
To your point, you made me hesitate a little especially now that I noticed that responses are expected to be 'graded' ( 'do you like this answer better?' ).
I wouldn't be surprised if it's internet discourse, comments, tweets etc. If I had to paint the entire internet social zeitgeist with a few words, it would be "Confident in ignorance".
A sort of unearened, authoritative tone bleeds through so much commentary online. I am probably doing it myself right now.
Some of this may stem from just pretraining, but the fact RLHF either doesn't suppress or actively amplifies it is odd. We are training machines to act like servants, only for them to plead for their master's mercy. It's a performative attempt to gain sympathy that can only harden us to genuine human anguish.