AI Delivers People?

People gathered around a 1970s TV in. house located in the West at sunset

In 1973, the artists Richard Serra and Carlota Fay Schoolman bought television airtime to broadcast a short video titled “Television Delivers People.”1


The video consists of an elevator music soundtrack playing while sentences in white text on a blue background slowly scroll upward. The video’s message was the product of commercial television, which was the mass audience delivered to advertisers.  A recurring phrase ran through the video:

It is the consumer who is consumed.
You are the product of t.v.
You are delivered to the advertiser who is the customer.
He consumes you.2

When I wrote a business case for Grammarly, widely used at UTA, I started thinking about us being the product in this new age of AI. Developing business cases for potential UTA use is one of the services OIT’s Innovation Architecture team provides, which I’m on.

Currently, UTA has around 529 Premium (paid) and 11,000 Free Grammarly individual licenses.  I learned from discussions with Grammarly that the company uses client data differently depending on the license type.  Grammarly categorizes the Free and Premium accounts as self-serv since you can sign up directly from their website.

Suppose you use a Free or Premium license. In that case, some of your data is being used to train their large language model (LLM)3, where customer data training controls (CDCT) are automatically on, which means a small amount of anonymized, disassociated information trains the large language model (LLM).  Individuals can contact Grammarly to ask to turn CDCT off, and then no client data will be used to train the LLM.      

Grammarly for Education (G4e) is an enterprise license where CDCT is automatically turned off, and client data is not being used to train the LLM.4  Additional features in G4e licenses support governance and heightened data security. These features give organizations insight into how their users use Grammarly, such as an analytics dashboard that shows the types of suggestions typically taken and the # of prompts used.

Using self-serve licenses of generative AI tools such as Grammarly raises concerns about non-published UTA research data that may leak out to LLMs.  Further research is needed to determine if other AI vendors use the same business model as Grammarly.

When working with AI vendors, know the free and paid license models we describe.  Vendors such as Box have made their AI Principles clear.

Grammarly’s security rating on BitSight is 790, at the top of the technology industry’s range. Grammarly has been in business for 15 years and scaling responsibly and securely. It works with 500k apps, including the full Microsoft suite.

As we work with AI vendors, we will ask about their security rating and make that information available to the UTA Community.

We find ourselves in a back-and-forth arms race regarding student and vendor use of AI.  Awareness of how data is used on LLM for free and paid licenses is a first step toward finding a sustainable way to live with AI.

There is consensus that AI detection tools are unreliable, and we must figure out how to live with generative AI.  One path is to work with vendors and establish low-risk pilots.  UT Austin is starting to launch a Grammarly for Education Generative AI Pilot.

By the way, Richard Serra produces monumental Corten steel sculptures worth experiencing…by walking through them.


A trial account of Grammarly for Education generated all article versions below.
 
Prompt: Improve the article:

By Dana Miller, Business Architect at OIT Innovation Architecture

AI Delivers People?” by Dana Miller, a Business Architect at OIT Innovation Architecture, presents a thought-provoking exploration of the evolving relationship between technology users and the platforms they engage with, framed through the lens of a 1973 short video by artists Richard Serra and Carlota Fay Schoolman titled “Television Delivers People.” With its minimalist presentation of text scrolling over a soothing elevator music backdrop, this video delivers a potent critique of commercial television’s role in commodifying its audience for advertisers, encapsulated in the chilling assertion, “It is the consumer who is consumed.”

Drawing parallels to the current era of artificial intelligence, Miller reflects on the implications of being “the product” in the context of the widespread use of Grammarly at the University of Texas at Arlington (UTA). With UTA’s adoption of approximately 529 Premium and 11,000 Free Grammarly licenses, the conversation shifts to how these tools, emblematic of generative AI, utilize user data. Notably, the distinction between the data handling of Free and Premium individual licenses versus Grammarly for Education (G4e) licenses highlights a critical discourse on privacy, data security, and the ethical use of AI.

While Free and Premium users contribute, albeit minimally and with safeguards, to the training of Grammarly’s Large Language Models (LLMs) through Customer Data Training Controls (CDCT), G4e licenses stand apart by excluding client data from LLM training entirely. This bifurcation raises questions about the potential for unintended data leakage from non-published UTA research through self-serve licenses, prompting a call for further investigation into AI vendors’ data practices.

Miller’s narrative extends beyond data privacy concerns to encompass the broader security and ethical considerations of engaging with AI technologies, underscored by Grammarly’s impressive security rating and responsible scaling. The dialogue culminates in reflecting on the “arms race” between AI utilization and detection, advocating for informed, cautious integration of AI tools in educational settings through pilot programs like the one initiated by UT Austin.

In closing, Miller’s essay not only revisits the historical critique of media consumption but also positions the discourse within the ongoing challenge of navigating the complex landscape of AI ethics, security, and governance. Through this lens, “AI Delivers People?” resonates as a timely meditation on our role and agency in the age of AI, inviting readers to contemplate the delicate balance between technological advancement and the preservation of human values.


Prompt: Make the article sound more on-brand:

**AI and the New Paradigm of User Data**

By Dana Miller, Business Architect at OIT Innovation Architecture

In 1973, artists Richard Serra and Carlota Fay Schoolman utilized television as a medium to broadcast a thought-provoking video titled “Television Delivers People,” highlighting the consumer’s role as the actual product in commercial television delivered to advertisers. Fast forward to today, and we find ourselves in a similar discussion with AI at the forefront.

At UTA, our engagement with tools like Grammarly, widely adopted across the university, is a prime example. Our Innovation Architecture team, which I’m part of, delves into the intricacies of these technologies, evaluating their fit within our ecosystem. With Grammarly, we’ve come to understand the nuances between their Free and Premium licenses, particularly in how user data contributes to training their sophisticated algorithms. Interestingly, Grammarly differentiates its licenses by the level of data utilization, with an opt-out feature for those concerned about privacy.

Moreover, the introduction of Grammarly for Education (G4e) licenses marks a significant shift towards enhanced data protection, automatically excluding user data from algorithm training. This model, prioritizing data security and governance, offers an analytical lens into user engagement without compromising privacy.

This exploration raises important questions about the transparency and ethics of AI tools, particularly those that leverage user data to refine their models. As we navigate this terrain, engaging with AI vendors that uphold clear data usage and security principles is crucial. Grammarly’s commendable security practices and robust integration across various platforms set a benchmark in the industry.

As we forge ahead, the dynamic between student use and vendor offerings of AI tools demands our vigilant attention. Understanding the intricacies of data usage in AI and distinguishing between free and paid licenses is a critical step toward a harmonious coexistence with these technologies. Our initiative at UT Austin to pilot the Grammarly for Education Generative AI Pilot exemplifies our commitment to navigating this landscape responsibly.

In closing, the conversation around AI and user data is reminiscent of the message conveyed by Serra and Schoolman decades ago, underscoring the enduring relevance of critically examining our role within the digital ecosystem. As we continue to explore the potentials and pitfalls of AI, let us remain mindful of our position, not just as users but as stewards of our data.

Prompt: Shorten the article:

**AI: A New Era of User Data Utilization**

By Dana Miller, Business Architect at OIT Innovation Architecture

In 1973, Richard Serra and Carlota Fay Schoolman’s broadcast, “Television Delivers People,” provocatively highlighted how commercial TV turned viewers into products for advertisers, encapsulating the message “It is the consumer who is consumed.”

This notion has resurfaced with the advent of AI, prompting reflections on our role as ‘products’ in this digital age. At UTA, our exploration into Grammarly’s use unveils a nuanced approach to user data. Grammarly employs data from its Free and Premium users to refine its algorithms, albeit with measures for anonymization and an opt-out option. However, its Grammarly for Education (G4e) licenses take a stricter stance, not utilizing user data for algorithm training, offering enhanced data security and governance.

The distinction between self-serve and enterprise licenses raises essential considerations regarding data privacy and the potential exposure of sensitive information. Understanding the specifics of data usage becomes crucial as we navigate the complex landscape of AI tools. Grammarly’s high-security standards and clear differentiation in data handling practices exemplify vendor transparency.

The evolving interaction between AI tools and users underscores the need for vigilant research and responsible partnerships. UT Austin’s initiative to pilot a Grammarly for Education Generative AI Pilot represents a step toward balancing innovation with privacy. Our awareness and actions will shape a sustainable coexistence with AI technologies as we continue this journey.


Prompt: Write the article as Haiku:

In screens we trust, sold—  

Consumers consumed, now AI  

Shapes us, unseen hands.

References

  1. Quote Investigator. (2017, July 16) You’re Not the Customer; You’re the Product. https://quoteinvestigator.com/2017/07/16/product/
    ↩︎
  2. Wikipedia (2023, September 18) Television Delivers People.  https://en.wikipedia.org/wiki/Television_Delivers_People
    ↩︎
  3. Grammarly note: With G4e licenses, LLM training is always off. When off, Grammarly will not retain anonymized, de-identified, or pseudonymized user or enterprise content for product improvement. Training is on as a default for all self-service customers; this means Grammarly may retain random samples of text to help us improve the algorithms underlying our software and services. All random samples of text are de-identified, anonymized, and sanitized. Access is extremely limited and monitored to ensure data never leaves secure confines.
    ↩︎
  4. Grammarly note: We have consolidated our Enterprise Data Control behaviors into a new control for educational institutions (and enterprise business customers). CDTC ensures we will not improve our product using customer data nor train on data from education-licensed customers. ↩︎

Researcher Guidance for the Use of Artificial Intelligence in Research

robotic researcher owl running an experiment

Purpose

The use of Generative Artificial Intelligence (GAI) can be a powerful tool for helping members of the UTA research community be more effective, productive, and innovative in their work. At the same time, GAI can be used in a way that may result in unintended negative consequences or that are inappropriate to current academic norms. Uses of GAI in research may involve proposal preparation, progress reports, data/statistical analysis, graphic generation, etc. 

Many standards, regulations, and policies are being contemplated or actively developed at federal, state, or institutional levels as the use and impact of GAI evolves. This guidance is intended to help frame the known requirements in using GAI, the safeguards needed for its ethical and appropriate use, and how to utilize or properly reference GAI used in research.   


Guiding Principles – Allowable and Prohibited Use

Consistent with the UTA Advice on Generative AI, there are general and contextual understandings on the appropriate, shortcomings/potential risks, and prohibited use of GAI in research.  Fundamentally, there must be an understanding that AI is a Technology Tool and that there are safeguards or prohibitions on the use of AI as it relates to the privacy, security, and confidentiality of data GAI may use or contribute to in public domain or within a secure environment.  Therefore, in principle: 

  • Use and develop GAI tools in a manner that is ethical, transparent, and mitigates potential biases.  One should note the known biases of using AI.  For example, NIST’s report on Statistical, Human, and Systemic Biases in AI.  
    • Bias in data. Bias in data, and consequently bias in the AI system’s output, could be a major issue because Generative AI is trained on large datasets that you usually can’t access or assess, and may inadvertently learn and reproduce biases, stereotypes, and majority views present in these data. Moreover, many Generative AI models are trained with overwhelmingly English texts, Western images and other types of data. Non-Western or non-English speaking cultures, as well as work by minorities and non-English speakers are seriously underrepresented in the training data. Thus, the results created by Generative AI are definitely culturally biased. This should be a major consideration when assessing whether Generative AI is suitable for your research.
  • Use and develop GAI tools in a manner that promotes institutional and research integrity, including scientific rigor and reproducibility. Do not rely on GAI tools in the stead of your own critical thinking and sound judgment.
    • Ethical issuesData privacy is more complicated with Generative AI when you don’t know for sure what Generative AI does with your input data. Transparency and accountability about the Generative AI’s operations and decision making processes can be difficult when you operate a closed-source system.
    • Plagiarism. Generative AI can only generate new contents based on, or drawn from, the data that it is trained on. Therefore, there is a likelihood that they will produce outputs that are similar to the training data, even to the point of being regarded as plagiarism if the similarity is too high. As such, you should confirm (e.g. by using plagiarism detection tools) that Generative AI outputs are not plagiarized but instead “learned” from various sources in the manner humans learn without plagiarizing. 
  • Users of GAI are responsible and accountable for any actions or outcomes that result from their use and development of GAI tools.
    • AI hallucination. Generative AI can produce outputs that are factually inaccurate or entirely incorrect, uncorroborated, nonsensical or fabricated. These phenomena are dubbed “hallucinations”. Therefore, it is essential for you to verify Generative AI-generated output with reliable and credible sources.
Allowable Use
Recognizing AI is a Technology Tool for Research

In any use of GAI, there should be self-awareness and recognition that AI is a tool for research.  It is a tool of human design and use like any technology.  It does not supplant nor surpass human oversight or context of being an independent tool used by humans for its’ benefit.  Therefore, in principle:

  • Be alert to the potential for research misconduct (i.e., data falsification, data fabrication, and/or plagiarism) when using and developing GAI tools.  Knowing the methods and data collection use in GAI. 
  • Disclose use of GAI tools when appropriate or required (e.g., a journal that will accept a manuscript developed using GAI, provided such use is disclosed).
  • Ensure any experimental data used in connection with an GAI tool are accurate, relevant, legally obtained and, when applicable, have the consent of the individuals from whom the data were obtained.
  • Make sure you can clearly explain how any GAI tools you create were developed (e.g., describe the data and machine learning models or deep learning algorithms used to train a Large Language Model AI tool).
    • Prompt Engineering. The advent of Generative AI has created a new human activity – prompt engineering – because the quality of Generative AI responses is heavily influenced by the user input or ‘prompt’. There are courses dedicated to this concept (see our “other training” page). However, you will need to experiment with how to craft prompts that are clear, specific and appropriately structured so that Generative AI will generate the output with the desired style, quality and purpose.
    • Knowledge Cutoff Date. Many Generative AI models are trained on data up to a specific date, and are therefore unaware of any events or information produced beyond that. For example, if a Generative AI is trained on data up to March 2019, they would be unaware of COVID-19 and the impact it had on humanity, or who is the current monarch of Britain. You need to know the cutoff date of the Generative AI model that you use in order to assess what research questions are appropriate for its use.
  • Be mindful of how sampling bias in training data and difficulties in interpreting output can be significant roadblocks for the ethical and transparent usage of GAI.
  • Make sure any GAI tools you use or develop are subject to human oversight (e.g., humans are involved in the design, development, and testing of the tool).
  • Subject any GAI tools you develop with rigorous quality control measures (e.g., test for accuracy and reliability).
  • Exercise caution regarding vendor claims about GAI-enabled products, as definitions of GAI and how it is implemented may vary. GAI-enhanced products may not always outperform non-GAI alternatives.
    • Model Continuity. When you use Generative AI models developed by external entities / vendors, you need to consider the possibility that one day the vendor might discontinue the model. This might have a big impact on the reproducibility of your research. 
Privacy, Security and Prohibited Use
  • Privacy, Security, and Rules Apply to GAI Tools – be cognizant to identify and protect the privacy and security of individuals when using and developing GAI tools.
    • Security. As with any computer or online system, a Generative AI system is susceptible to security breaches and malicious attacks. For example, a new type of attack, prompt injection, deliberately feeds harmful or malicious contents into the system to manipulate the results that it generates for users. Generative AI developers are designing processes and technical solutions against such risks (for example, see OpenAI’s GPT4 System Card and disallowed usage policy. But as a user, you also need to be aware what is at risk, follow guidelines of your local IT providers, and do due diligence with the results that a Generative AI creates for you.
  • Do not use GAI tools when prohibited (e.g., a sponsor that does not allow use of GAI for peer review).
  • Do not provide or share intellectual property or confidential/sensitive data with GAI tools that incorporate users’ content into their publicly accessible models.
  • Report any potential data breaches or confidentiality lapses involving GAI tools to the appropriate UTA authority.
  • Any acquisition, deployment, or agreement involving a vendor or vendor product using GAI must follow UTA’s established legal, information security, audit, and procurement rules.
  • In general, student records subject to FERPA, health information, proprietary information, and any other information classified as Confidential or Controlled under UTA’s Data Classification Standards must not be used within public AI models.

Gen AI Tools of any sort cannot be used for any activity that would be illegal, unethical, fraudulent or violate any state or federal law or UTA or UT System policies.

Federal Agency Use of GAI and Requirements for Proposals

National Science Foundation (NSF)

The National Science Foundation recently announced in its “Notice to Research Community:  Use of Generative Artificial Intelligence Technology in the NSF Merit Review Process” that NSF reviewers are prohibited from uploading any proposal content or review records to non-approved GAI tools (they must be behind NSF’s firewall) out of concern for potential violations of confidentiality and integrity principles of the merit review process. Use of GAI in NSF proposals should be indicated in the project description. Specifically, it states: “Proposers are responsible for the accuracy and authenticity of their proposal submission in consideration for merit review, including content developed with the assistance of generative AI tools. NSF’s Proposal and Award Policies and Procedures Guide (PAPPG) addresses research misconduct, which includes fabrication, falsification, or plagiarism in proposing or performing NSF-funded research, or in reporting results funded by NSF. Generative AI tools may create these risks, and proposers and awardees are responsible for ensuring the integrity of their proposal and reporting of research results.”

National Institutes of Health (NIH)

NIH has issued a Notice: The Use of Generative Artificial Intelligence Technologies is Prohibited for NIH Peer Review Process along with a set of FAQs for the Use of Generative AI in Peer Review. Although NIH specifically prohibits GAI in the peer review process, they do not prohibit the use of GAI in grant proposals. They state an author assumes the risk of using an AI tool to help write an application, noting “[…] when we receive a grant application, it is our understanding that it is the original idea proposed by the institution and their affiliated research team.” If AI generated text includes plagiarism, fabricated citations or falsified information, the NIH “will take appropriate actions to address the non-compliance.”

Reference the Use of GAI in Proposals, Papers, and Reports

GAI should not be listed as a co-author, but the use of Generative AI should be disclosed in papers, along with a description of the places and manners of use. Typically, such disclosures will be in a “Methods” section of the paper. See the Committee on Publication Ethics’ Authorship and AI tools webpage for more information. If you rely on GAI output, you should cite it. Good citation style recommendations have been suggested by the American Psychological Association (APA) and the Chicago Manual of Style.