OpenAI email archives from Musk v Altman - Game of Thrones

Reference: OpenAI email archives from Musk v Altman by LessWrong

These emails are part of the ongoing legal disputes between Elon Musk and Sam Altman surrounding recent OpenAI developments. Thanks to this, the public has gained access to email exchanges between some of the most powerful figures in the tech world today, including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and Andrew Karpathy.

For me, this has been an eye-opening experience, especially for someone who is still learning about the tech world, Silicon Valley, startups, and entrepreneurship. It can be compared to MIT or Stanford releasing their lectures to the world.

After reading through the content, I think the story can be divided into the following chapters:


Chapter 1: Origins - The noble idea of AI for everyone

The idea began on May 25, 2015, when Sam Altman sent an email to Elon Musk about a concept for a “Manhattan Project for AI” — ensuring that the tech belongs to the world through some sort of nonprofit.
Elon Musk quickly responded, showing enthusiasm for the idea.
Throughout the emails, I noticed Elon Musk repeatedly expressing concern (or even obsession) about Google, DeepMind, and the possibility of Google creating AGI and dominating the world.
From the very first email, Sam Altman, somehow, seemed to understand Elon Musk’s concerns or perhaps shared the same fears. He mentioned the need to “do something to prevent Google from being the first to create AGI,” quickly gaining Elon Musk’s agreement.


Chapter 2: The first building blocks - Contracts to attract initial talent for OpenAI

The next phase focused on drafting contracts (offer letters or compensation frameworks) to attract the first talents to work at OpenAI, discussing “opening paragraphs” for OpenAI’s vision, and even deciding what to say in “a Wired article.”

What I found interesting here were:

  • How these people communicated via email: direct, straight to the point, and concise.
  • The founders’ emphasis on building an excellent founding team and carefully considering contract details.
  • Elon Musk’s willingness to personally meet and convince individuals to join OpenAI.

Chapter 3: Conflict - The battle for leadership control

Conflict seemed to arise around August 2017 (Shivon Zilis to Elon Musk, cc: Sam Teller, Aug 28, 2017, 12:01 AM), when Greg and Ilya expressed concerns about Elon Musk’s management, such as:

  • “How much time does Elon want to spend on this, and how much time can he actually afford to spend on this?”
  • They were okay with less time/less control or more time/more control, but not less time/more control. Their fear was that without enough time, there wouldn’t be adequate discussion to make informed decisions.

Elon responded:

  • “This is very annoying. Please encourage them to go start a company. I’ve had enough.”

The highlight of this chapter might be an email from Ilya Sutskever to Elon Musk, Sam Altman, cc: Greg Brockman, Sam Teller, Shivon Zilis (Sep 20, 2017, 2:08 PM), where Ilya and Greg said:

  • To Elon: “The current structure provides you with a path where you end up with unilateral absolute control over the AGI. You stated that you don’t want to control the final AGI, but during this negotiation, you’ve shown us that absolute control is extremely important to you. The goal of OpenAI is to make the future good and avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. Therefore, it’s a bad idea to create a structure where you could become a dictator, especially when we can create a structure that avoids this possibility.”

  • To Sam: “We don’t understand why the CEO title is so important to you. Your stated reasons have changed, and it’s hard to understand what’s driving this. Is AGI truly your primary motivation? How does it connect to your political goals? How has your thought process changed over time?”

Elon replied:

  • “Guys, I’ve had enough. This is the final straw. Either go do something on your own or continue with OpenAI as a nonprofit. I will no longer fund OpenAI until you have made a firm commitment to stay, or I’m just being a fool who is essentially providing free funding for you to create a startup. Discussions are over.”

Chapter 4: The finale

The final email exchanges between Elon and Sam occurred around March 2019. At this time, Sam, now CEO of OpenAI, drafted a plan:

  • “We’ve created the capped-profit company and raised the first round. We did this in a way where all investors are clear that they should never expect a profit.
  • We made Greg chairman and me CEO of the new entity.
  • Speaking of the last point, we are now discussing a multi-billion dollar investment, which I would like your advice on when you have time.”

Elon replied, once again making it clear that he had no interest in OpenAI becoming a for-profit company.


Improving ChatGPT’s interpretability with cross-modal heatmap

(2024-11)

I tried a simple experiment—took a snapshot of a single cell in a Sudoku puzzle (a 3x3 grid with digits 1 to 9) and asked ChatGPT to find the location of a specific number in the grid.

Sudoku question

As shown in the picture, ChatGPT seemed to handle the question just fine! But as soon as I upped the challenge level, it started to show its infamous hallucination problem :D

Failed answer

So, how can we improve this?

One idea: applying techniques like DAAM to create a cross-modal heatmap (example attached) could help provide a rough idea of where each visual-text pair is mapped. By using this data to fine-tune the model, could we boost its interpretability?

DAAM example

Update: It’s my mistake for not instructing ChatGPT properly :D

ChatGPT's correct answer with proper instruction

The Prisoner’s Dilemma

(2024-09)

Imagine a game between two players, A and B, competing for a prize of 1 million dollars from a bank. They are asked to choose either “Split” or “Take All” the prize. If both choose “Split,” they each receive $500,000. If one chooses “Split” and the other chooses “Take All,” the one who chooses “Take All” wins the entire prize. If both choose “Take All,” they both lose and get nothing. They can’t communicate with each other and must decide whether to trust/cooperate.

This is the Prisoner’s Dilemma, one of the most famous problems in Game Theory. In this scenario, when the game is played only once, the best strategy for each person is not to cooperate. However, in real life, many situations are not zero-sum games, where only one can win. Instead, all parties can win and benefit from a shared bank, our world.

And the best strategy to win in life is to cooperate with others, or as summarized in the video: be nice and forgiving, but don’t be a pushover or too nice so others can take advantage of you.

A new perspective on the motivation of VAE

(2023-09)

  • Assume that \(x\) was generated from \(z\) through a generative process \(p(x \mid z)\).
  • Before observing \(x\), we have a prior belief about \(z\), i.e., \(z\) can be sampled from a Gaussian distribution \(p(z) = \mathcal{N}(0, I)\).
  • After observing \(x\), we want to correct our prior belief about \(z\) to a posterior belief \(p(z \mid x)\).
  • However, we cannot directly compute \(p(z \mid x)\) because it is intractable. Therefore, we use a variational distribution \(q(z \mid x)\) to approximate \(p(z \mid x)\). The variational distribution \(q(z \mid x)\) is parameterized by an encoder \(e(z \mid x)\). The encoder \(e(z \mid x)\) is trained to minimize the KL divergence between \(q(z \mid x)\) and \(p(z \mid x)\). This is the motivation of VAE.

Mathematically, we want to minimize the KL divergence between \(q_{\theta} (z \mid x)\) and \(p(z \mid x)\):

\[\mathcal{D}_{KL} (q_{\theta} (z \mid x) \parallel p(z \mid x) ) = \mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log \frac{q_{\theta} (z \mid x)}{p(z \mid x)} \right] = \mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log q_{\theta} (z \mid x) - \log p(z \mid x) \right]\]

Applying Bayes rule, we have:

\[\mathcal{D}_{KL} (q_{\theta} (z \mid x) \parallel p(z \mid x) ) = \mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log q_{\theta} (z \mid x) - \log p(x \mid z) - \log p(z) + \log p(x) \right]\] \[\mathcal{D}_{KL} (q_{\theta} (z \mid x) \parallel p(z \mid x) ) = \mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log q_{\theta} (z \mid x) - \log p(x \mid z) - \log p(z) \right] + \log p(x)\] \[\mathcal{D}_{KL} (q_{\theta} (z \mid x) \parallel p(z \mid x) ) = - \mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log p(x \mid z) \right] + \mathcal{D}_{KL} \left[ q_{\theta} (z \mid x) \parallel p(z) \right] + \log p(x)\]

So, minimizing \(\mathcal{D}_{KL} (q_{\theta} (z \mid x) \parallel p(z \mid x) )\) is equivalent to maximizing the ELBO: \(\mathbb{E}_{q_{\theta} (z \mid x)} \left[ \log p(x \mid z) \right] - \mathcal{D}_{KL} \left[ q_{\theta} (z \mid x) \parallel p(z) \right]\).

Another perspective on the motivation of VAE can be seen from the development of the Auto Encoder (AE) model.

  • The AE model is trained to minimize the reconstruction error between the input \(x\) and the output \(\hat{x}\).
  • The AE process is deterministic, i.e., given \(x\), the output \(\hat{x}\) is always the same.
  • Therefore, the AE model does not have contiguity and completeness properties as desired in a generative model.
  • To solve this problem, we change the deterministic encoder of the AE model to a stochastic encoder, i.e., instead of mapping \(x\) to a single point \(z\), the encoder maps \(x\) to a distribution \(q_{\theta} (z \mid x)\). This distribution should be close to the prior distribution \(p(z)\). This is the motivation of VAE.

Data-Free Knowledge Distillation

(2023-08)

  • Reference: Data-Free Model Extraction
  • What is Data-Free KD? It is a method to transfer knowledge from a teacher model to a student model without using any data. The idea is learn a generator that can generate synthetic data that is similar to the data from the teacher model. Then, we can use the synthetic data to train the student model. \(L_S = L_{KL} (T(\hat{x}), S(\hat{x}))\)

Where \(T(\hat{x})\) is the teacher model and \(S(\hat{x})\) is the student model. \(\hat{x}\) is the synthetic data generated by generator \(G\).

\[L_G = L_{CE} (T(\hat{x}), y) - L_{KL} (T(\hat{x}), S(\hat{x}))\]

Where \(y\) is the label of the synthetic data. Minimizing first term encourages the generator generate data that fall into the target class \(y\), while maximizing the second term encourages the generator generate diverse data? Compared to GAN, we can think both teacher and student models are acted as discriminators.

This adversarial game need to intergrate to the training process in each iteration. For example, after each iteration, you need to minimizing \(L_G\) to generate a new synthetic data. And then using \(\hat{x}\) to train the student. This is to ensure that the synthetic data is new to the student model. Therefore, one of the drawbacks of DFKD is that it is very slow.

Tuan (Henry)’ work on improving Data-Free KD:

  • Introducing noisy layer which is a linear layer that transforms the input (label-text embedding vector from CLIP) before feeding to the generator as previous work. (Input -> Noisy Layer -> Generator -> Teacher/Student -> \(L_G\)).
  • One important point is that the Noisy layer need to reset its weight every time we generate a new batch of synthetic data (while fixing the generator). This is to ensure the diversity of the synthetic data.
  • One interesting finding is that the noisy layer can be applied to all kinds of label-text embedding from different classes, while if using individual noise layers for each class, the performance is worse.

How to disable NSFW detection in Huggingface

(2023-08)

  • context: I am trying to generate inappropriate images using Stable Diffusion with prompts from the I2P benchmark. However, the NSFW detection in Huggingface is too sensitive and it filters out all of the images, and return a black image instead. Therefore, I need to disable it.
  • solution: modify the pipeline_stable_diffusion.py file in the Huggingface library. just return image and None in the run_safety_checker function.
# line 426 in the pipeline_stable_diffusion.py
def run_safety_checker(self, image, device, dtype):
    return image, None

    # The following original code will be ignored
    if self.safety_checker is None:
        has_nsfw_concept = None
    else:
        if torch.is_tensor(image):
            feature_extractor_input = self.image_processor.postprocess(image, output_type="pil")
        else:
            feature_extractor_input = self.image_processor.numpy_to_pil(image)
        safety_checker_input = self.feature_extractor(feature_extractor_input, return_tensors="pt").to(device)
        image, has_nsfw_concept = self.safety_checker(
            images=image, clip_input=safety_checker_input.pixel_values.to(dtype)
        )
    return image, has_nsfw_concept

(#Idea, #GenAI, #TML) Completely erase a concept (i.e., NSFW) from latent space of Stable Diffusion.

  • Problem: Current methods such as ESD (Erasing Concepts from Diffusion Models) can erase quite well a concept from the Stable Diffusion. However, recent work (Circumventing Concept Erasure Methods for Text-to-Image Generative Models) has shown that it is possible to recover the erased concept by using a simple Textual Inversion method.
  • Firstly, personally, I think that the approach in Pham et al. (2023) is not very convincing. Because, they need to use additional data (25 samples/concept) to learn a new token associated with the removed concept. So, it is not surprising that they can generate images with the removed concept. It is becaused of the power of the personalized method, not because of the weakness of the ESD method. It would be better if we can compare performance on recovering concept A (concept A is totally new to the base Stable Diffusion model such as your personal images) on two models: a SD model injected with concept A and a model fine-tuned with concept A and then erased concept A and then injected concept A back. If the latter model can not generate images with concept A better than inject concept A directly to the base model, then we can say that the ESD method is effective.

Helmholtz Visiting Researcher Grant

(2023-08)

  • https://www.helmholtz-hida.de/en/new-horizons/hida-visiting-program/
  • 1-3 months visiting grant for Ph.D. students and postdocs in one of 18 Helmholtz centers in Germany.
  • Deadline: 16 August 2023 and will end on 15 October 2023.
  • CISPA - Helmholtz Center for Information Security https://cispa.de/en/people

Where to find potential collaborators or postdoc positions

(2023-08)

Each year, the Australian Research Council releases the outcomes of funded/accepted projects from leading researchers and professors across Australian Universities. This information can be a great resource for finding collaborations, PhD positions, and research job opportunities.

For example, if you’re interested in the topic of Trust and Safety in Machine Learning, you can find several professors who have recently received funding to work on related topics.

Link to the ARC data: https://lnkd.in/gge2FJR3

Micromouse Competition

(2023-07)

  • First introduced by Claude Shannon in 1950s.
  • At the begining, it was just a simple maze solving competition. However, after 50 years of growing and competing, it has become a very competitive competition with many different categories: speed, efficiency, size. And along with its, many great ideas have been introduced and applied to the competition. It involes many different fields: mechanical, electrical, software, and AI all in just a small robot.
  • The Fosbury Flop in high jump. When everyone use the same jump technique, the performance becomes saturated. Then Fosbury introduced a new technique (backward flop) that no one had ever thought of before. And it became the new standard (even named after him). This phenomenon also happens in the Micromouse competition.
  • The two most important game changing ideas in the history of micromouse competition: capability to diagonal movement and using fan (vacumn) to suck the mouse to the path so that the mouse can move faster as in a racing car.

Reference: