Raccoon Noises

Conversation

Srijit Kumar Bhadra

Edited 1 year ago

Why Zero Trust Information is one of the right paradigms in this age of ChatGPT and Bing chatbot for regular end users?

I thank Carl for sharing You Are Not a Parrot And a chatbot is not a human. And a linguist named Emily M. Bender is very worried what will happen when we forget this by Elizabeth Weil. It is worth reading till the very end several times.

In order to understand the above mentioned article and Zero Trust Information, we may have to refer the lucid overview of Large Language models here by Murray Shanahan first.

What are LLMs (Large Language Models)?

LLMs are generative mathematical models of the statistical distribution of tokens in the vast public corpus of human- generated text, where the tokens in question include words, parts of words, or individual characters including punctuation marks. They are generative because we can sample from them, which means we can ask them questions. But the questions are of the following very specific kind. “Here’s a fragment of text. Tell me how this fragment might go on. According to your model of the statistics of human language, what words are likely to come next?”

Suppose we give an LLM the prompt “The first person to walk on the Moon was ”, and suppose it responds with “Neil Armstrong”. What are we really asking here? In an important sense, we are not really asking who was the first person to walk on the Moon. What we are really asking the model is the following question: Given the statistical distribution of words in the vast public corpus of (English) text, what words are most likely to follow the sequence “The first person to walk on the Moon was ”? A good reply to this question is “Neil Armstrong”. Similarly, we might give an LLM the prompt “Twinkle twinkle ”, to which it will most likely respond “little star”. On one level, for sure, we are asking the model to remind us of the lyrics of a well-known nursery rhyme. But in an important sense what we are really doing is asking it the following question: Given the statistical distribution of words in the public corpus, what words are most likely to follow the sequence “Twinkle twinkle ”? To which an accurate answer is “little star”.

Also a review of Harry G. Frankfurt’s concept of bullshit can be worth. Harry Frankfurt’s idea of bullshit is that it is a form of speech that is not necessarily false, but is not based on truth. It is often used to create a false impression or to avoid engaging with the truth. It is also not intended to convey any meaningful information, but is instead used as a way to create a sense of certainty or to manipulate the listener. The book Calling Bullshit by Carl T. Bergstrom and Jevin D. West may be referred.

Large language model size has been increasing 10x every year for the last few years. This is starting to look like another Moore’s Law. With model sizes approaching thousand billions of parameters, the boundaries between probability and reality are going to get even more blurred. Distinguishing bullshit will be challenging for a layman human mind. Engineering tools based on large language models will be seen as advancing productivity and efficiency in the areas of virtual assistants, image annotation, content creation, cybersecurity etc. As stated here, Pricewaterhouse Cooper’s (PwC) third annual AI Predictions Report has highlighted the importance of focusing on the fundamentals in preparation for large-scale AI projects. Pricewaterhouse Cooper’s (PwC) sees AI as a major game changer. AI could contribute up to $15.7 trillion to the global economy in 2030, more than the current output of China and India combined. Of this, $6.6 trillion is likely to come from increased productivity and $9.1 trillion is likely to come from consumption side effects.

AI and Large Language Models are not going away soon. How do we mitigate the risks? That is a subject of deep research by itself and well elaborated in the article AI might bring huge benefits — if we avoid the risks. The immediate option, for ordinary mortals, like me, is to increase our awareness, further sharpen our senses and increase our mental ability and alertness. I believe this is what brilliant minds like Emily M Bender, Timnit Gebru and others are meaningfully attempting with deepest honesty and sincerity to strike a balance.

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. I am not yet sure about the effectiveness of tools like AI Text Classifier and GPTZero that can distinguish between AI-written and human-written text.

The relevant concept of Zero Trust Information was introduced in AI Homework by Ben Thompson. Ben aptly says that real skill for students will be in verifying the answers the systems, like ChatGPT churns out, i.e. learning how to be a verifier and an editor, instead of a regurgitator. Even before the advent of ChatGPT, many of us are already aware that in today’s world of fake news and information deluge, verifying and editing must be an essential skill for every individual. In his article, Ben suggests that we adhere to the concept of “Zero Trust Information” similar to the paradigm behind Zero Trust Networking.

In You Are Not a Parrot And a chatbot is not a human. And a linguist named Emily M. Bender is very worried what will happen when we forget this Elizabeth Weil says that tech-makers assuming their reality accurately represents the world create many different kinds of problems. The training data for ChatGPT is believed to include most or all of Wikipedia, pages linked from Reddit, a billion words grabbed off the internet. (It can’t include, say, e-book copies of everything in the Stanford library, as books are protected by copyright law.) The humans who wrote all those words online overrepresent white people. They overrepresent men. They overrepresent wealth. What’s more, we all know what’s out there on the internet: vast swamps of racism, sexism, homophobia, Islamophobia, neo-Nazism.

With such background scenarios of biasedness, Ben’s suggestion of “Zero Trust Information” can become one of the necessary tools for survival for common end users.

It is time to re-train our own minds again and again. I repeat Emily M Bender’s rallying cries below.

“Please do not conflate word form and meaning. Mind your own credulity.“

It is high time the difference between form and meaning was well understood. The next three paragraphs are quoted from the blog by Scott Aaronson.

Form is the physical structure of something, while meaning is the interpretation or concept that is attached to that form. For example, the form of a chair is its physical structure – four legs, a seat, and a back. The meaning of a chair is that it is something you can sit on.

This distinction is important when considering whether or not an AI system can be trained to learn semantic meaning. AI systems are capable of learning and understanding the form of data, but they are not able to attach meaning to that data. In other words, AI systems can learn to identify patterns, but they cannot understand the concepts behind those patterns.

For example, an AI system might be able to learn that a certain type of data is typically associated with the concept of “chair.” However, the AI system would not be able to understand what a chair is or why it is used. In this way, we can see that an AI system trained on form can never learn semantic meaning.

#AI #OpenAI #ChatGPT #ZeroTrustInformation #LargeLanguageModels #LLM

cc: @srijit @srijit

Srijit Kumar Bhadra

srijit

6 months ago

Reply to @srijit

Zero Trust Information

Shel Israels’s one line advice, regarding usage of GPT AI, reinforces my thoughts regarding Zero Trust Information as the right paradigm in this age of ChatGPT, Google Bard and Bing chatbot for regular end users.

My advice is to treat GPT AI the same way you treat a blinking yellow light on a dark street: proceed with caution.

#AI #OpenAI #ChatGPT #ZeroTrustInformation #LargeLanguageModels #LLM #GPTAI

cc: @srijit

About Raccoon Noises

Rules

Hate against minority groups is forbidden. This includes racism, sexism, ableism, xenophobia, homophobia, transphobia,, antisemitism, islamophobia, queer exclusionism, etc.
Content that is illegal under German Law is not permitted. This especially includes the promotion and dissemination of any Nazi symbolism and ideology, except for education, reporting on past or current events, and antifascist art.
Please add content description to all media that you post. This instance automatically adds a CW if it is missing. If you are unable to create one, you can request one via the #DescriptionWanted hashtag
Be considerate. Add content warnings for NSFW Content, common phobias, overly long posts, controversial subjects, etc. Please try to avoid flashing images and quickly moving text inside of your posts.
NSFW content is generally allowed, but all NSFW content must be properly marked as such, including kinks. Profile images, names, bios, etc must be fully SFW, or they are subject to removal
Bots are allowed, however they must be marked as such and must make unlisted posts, may only @ or interact with posts of other users iff they have prompted the bot, or have given explicit permission to do so. Additionally, bots may not post more than 10 posts in a 60 minute interval without interaction.

We highly encourage reporting posts violating our rules, even if they are not on our instance. Your reports will not be ignored. For transparency we publish local moderation decisions for users on this server, and federation moderation decisions on the #FediBlock hashtag.
We do the following moderation automatically:

Unlisting of bot posts
Adding of CWs to unlabeled media
Modification or removal of posts that cause issues with certain clients

Privacy Policy

What data do we collect?

We collect the following data:

Email Addresses from local users
Posts and Media uploaded by local users
User Profiles and Posts by certain remote users

How do we collect your data?

If you are a user of this instance, we collect and process your data when you sign up for or use interactive features (e.g. Posting) of the Website.
If you are not a local user, we collect your data over the following ways:

One of our users has requested to follow your account, and you have accepted the request.
One of your posts has been interacted with by a remote account, that a local account has followed. This includes Replies, Repeats, Quotes, Likes, Emoji Reactions, and @-Mentions.
You have requested that your post is shown to one of our users (i.e. through @-Mentions or DMs)
User Interaction: One of our users has explicitely looked up your profile or one of your posts on this instance, for example to interact with it.
You have posted a public post on an instance that participates in the awoo.today relay.

How will we use your data?

We collect your data so that we can:

Store and display your posts to our local users
Display public posts to anonymous users
Deliver your public, unlisted, and private posts to your followers
Deliver direct messages to the recipient
Allow our users to follow you
Allow our users to interact with your posts

As members of the awoo.today relay, we will send posts that you have marked as “public” to all of the other instances participating in the relay.

How do we store your data?

We store your post, profile and account data securely in the Hetzner Datacenter in Falkenstein, Germany. See their DIN ISO/IEC 27001 certification Media is stored on Backblaze B2
We employ technical security measures to avoid exposure to sensitive data.
We also store backups of post, profile, and account data in multiple locations, in an encrypted form, on our server near Chemnitz, Germany, as well as on Backblaze B2.
For technical reasons it is not possible modify these backups to remove your data. If this is a concern, please contact us.

What are your data protection rights?

We want to make sure that you are aware of your data protection rights. Every user is entitled to the following:
The right to access — You can request a copy of the data we have about you. This may require a short verification for remote users. Local users can do so in the settings under Export/Import
The right to rectification — You can request us to correct any information you believe is inaccurate. You also have the right to request us to complete the information you believe is inaccurate.
The right to erasure — You can request us to erase the data we have about you.
The right to restrict to processing — You can restrict us from transmitting your posts to other servers by setting your post visiblity to “Local”. Remote users can also restrict processing of certain posts, by setting its visiblity to “Unlisted” or “Private”.
The right to object to processing — As a remote user, you can object to further processing of posts and profile data by blocking this domain.
The right to data portability — You can at any point move to other instances. Due to technical restrictions, it is currently not possible to automatically transfer the users you follow and posts to your new account.
If you make a request, we have one month to respond to you. If you would like to exercise any of these rights, or need help with the included tools, please contact us at our email privacy@chir.rs

Cookies

Cookies are text files placed on your computer to collect standard Internet log information and visitor behavior information. When you visit our websites, we may collect information from you automatically through cookies or similar technology
For further information, visit allaboutcookies.org.

How do we use cookies?

We use cookies for keeping you logged in. Additionally we store certain configuration in cookies, however these cookies are never transmitted to anyone.

How to manage cookies

You can tell your browser to not accept cookies, or tell it to remove cookies this website has stored on your device. Please consult your browser’s documentation on instructions on how to do that.

Privacy policies of other websites

This site contains many links to other websites. This privacy policy only applies to this website. Please consult the privacy policy of these remote sites before entering any personal information.

Changes to our privacy policy

We may make occasional adjustments to this privacy policy. This policy was last updated on 2022-12-30.

How to contact us

If you have any questions about this policy, the data we hold about you, or want to exercise one of your data protection rights, please contact us at: privacy@chir.rs

How to contact the appropriate authority

Should you wish to report a complaint, or if you feel that we haven’t addressed your concern in a satisfactory manner, you may contact the Sächsische Datenschutzbehörde.

We also offer the Mastodon Web UI. Keep in mind that some features are missing, like emoji reactions, quoting, and JPEG XL.

Art Credit

Bun, blobfox, vlpn, raccoon, fox, gphn, neofox, neocat, drgn, floof: Created by @volpeon@is-a.wyvern.rip
rosahaj pride: by @braid@alpaka.social