Thoughts on GenAI (2/4): Fake Information, Scams, and Phishing
This is the second of four posts on GenAI:
- Part 1: AI Development and Proof of Digital Identity
- Part 2: Fake Information, Scams, and Phishing (this post)
- Part 3: Safe, Secure AI
- Part 4: AI Speculation
The development of AI has significant implications in the cybersecurity domain. In particular, the evolution of phishing attacks reflects the advancement of AI technology. Methods that initially deceived people through text and photos are gradually evolving into sophisticated techniques using voice (DeepVoice) or video (DeepFake). A few days ago, there was an incident where DeepFake technology was used to impersonate a company’s CFO and successfully execute a large-scale financial fraud through a video conference call. This shows that such concerns are becoming reality.
Similarly, the problem of mass-producing fake information using DeepFake and similar technologies needs to be seriously addressed. 2024 is a year with major elections in several countries including Korea and the United States, and the possibility of political attacks using fake information has greatly increased. Since information manipulation using technology can influence election results, preparing countermeasures seems urgent.
Perhaps in response to this problem, the executive order on “Safe, Secure AI” announced by US President Biden in October last year includes measures to protect Americans from fraud and deception using AI. This order aims to establish standards and best practices for detecting AI-generated content and authenticating official content.
Accordingly, since last year, big tech companies like MS, Google, and Meta have been taking measures such as attaching watermarks to GenAI outputs generated by their products.
In this situation, important questions arise. Will attaching watermarks to generated GenAI outputs be sufficient? How might this develop in other ways in the future? And what methods can respond to direct attacks using open source or self-built models?
The fact that OpenAI explicitly stated content related to model safety on the main page when announcing the Sora model suggests that such efforts are ongoing. Looking briefly at the Sora safety section, you can see that they are considering model safety from various perspectives:
We’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products.
We are working with red teamers - domain experts in areas like misinformation, hateful content, and bias - who will be adversarially testing the model.
We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.
We’re leveraging the existing safety methods … applicable to Sora as well.
Our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others.
Robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user.
We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology.
Google also continues this consideration by emphasizing the responsibility of AI models in their AI Principles:
We will not design or deploy AI in the following application areas:
Technologies that cause or are likely to cause overall harm. … will incorporate appropriate safety constraints.
Weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people.
Technologies that gather or use information for surveillance violating internationally accepted norms.
Technologies whose purpose contravenes widely accepted principles of international law and human rights.
Despite these safety measures, methods will emerge to produce content without constraints, such as converting existing AI models to a state where they can be queried without restrictions (e.g., GPT’s DAN (Do Anything Now) mode). But compared to producing text or photos without constraints, videos could cause more serious problems. What if someone uses a Sora-like model in DAN mode to produce illegal videos?
These concerns are complex problems that go beyond mere technical issues to include social and ethical considerations. Collaboration among various stakeholders will be needed going forward.