Some thoughts on the complexity of personal preferences, and their implications for aligning future AI.
Revealed Preferences
Revealed preferences are those which are implicitly exposed by someone's activity or behavior. A preference for one option versus another can be reasonably inferred when someone repeatedly elects to choose it.
Expressed Preferences
Revealed preferences are not necessarily the same as expressed preferences, which are preferences people profess to hold. You realize this very quickly growing up hiding and even trying to suppress your sexuality.
Hidden Preferences
Because of the inconsistent correlation between revealed and expressed preferences, "asking users" what they want in abstract terms is a poor substitute for situational roleplay.
At the same time, where users lie about their true preferences, even through their actions (such as the closeted individual dating the "default" sex, in denial) inverse reinforcement learning (inferring preferences through observation alone) is insufficient to ascertain completely accurate preferences and goal functions for AI models. Tacit knowledge and unrevealed preferences require alternative approaches.
Collecting this information and making it available for use in a user-understandable, privacy-preserving manner will be one of the biggest challenges of mass-deployed aligned AI agents.