The prominence of Generative artificial intelligence (GenAI) throughout all aspects of research has seemingly grown exponentially in recent months – globally, not just at Iowa State University. GenAI tools have the capacity to make great contributions to research and our society, but unguided and unchecked, they run the risk of undermining the credibility and professionalism of our research enterprise.
Generative AI in Research: Guidance FAQ
Reflecting our strategic aspiration to lead a culture of research professionalism at Iowa State, the Office of the Vice President for Research has compiled the following FAQ to provide guidance to our research community on the responsible and ethical use of GenAI in research endeavors.
GenAI is a type of Artificial Intelligence that uses machine learning techniques to create or evaluate content. GenAI tools “learn” by processing large amounts of data, such as text, audio, images, and videos from the internet, and intuiting patterns from these data to generate new content.
To make the most of GenAI tools for research, it is helpful to understand generally how they work. Although the technological process of generating content is complicated – combining statistics, linear algebra, calculus, and neural networks – it is based on making predictions from existing data. Put simply, GenAI tools take large amounts of training data, such as text or images from the internet, and study those data to glean patterns from which they will generate new content. Before using these tools, you should be aware of how they gather and process data and the ethical concerns surrounding their production.
No. With a technology that is evolving as rapidly as GenAI, any formal policy is likely to already be outdated by the time it would be enacted across the institution. Therefore, we urge all principal investigators to follow the AI guidance and direction of their professional societies, funding agencies, and the publishing houses they target for their research papers.
Most GenAI tools are proprietary, which means that their creators do not share details about how the tool was created. It is, therefore, impossible for its users to know what data are being used to train the models and how those data are being cleaned and processed. The black-box nature of these tools raises important ethical considerations surrounding transparency, bias, and reliability.
Transparency: Because GenAI tools are trained through analyzing data, it is important that users are aware of the sources and technical methods used to gather, clean, and process those data. However, most GenAI companies like OpenAI and Google (the creators of ChatGPT and Bard, respectively) do not release information about their training data or methods. Also keep in mind that what is considered scientific truth is constantly changing; what may be assumed to be known today may be disproven tomorrow.
Bias: The lack of transparency also contributes to the biased nature of these tools. Because the training data and cleaning process remain inaccessible to the public, it is impossible to know what kinds of content (from which websites, for example) the model may have ingested during training and what perspectives from those sources have been encoded into the model.
Reliability: While GenAI tools are good at synthesizing data from a variety of sources, they also tend to provide inaccurate information, a phenomenon often described as “hallucinating.” The risks for this tendency are more serious when these tools are being used in engineering, legal, or financial contexts.
Researchers should seek guidance from disciplinary societies, funding agencies, or publishing organizations when considering whether to use GenAI tools in research. Current practices suggest the appropriate use of GenAI in research includes enhancing or refining a novel idea, basic analysis/computation, or summarizing data that can be verified. When defining a research project or gathering necessary background information, examples of appropriate uses may include triaging, organizing, and summarizing information; and drafting literature or improving drafts of literature reviews.
When using GenAI in research collaborations, be respectful and transparent about its use:
- Thoroughly verifying GenAI content;
- Adhering to ethical guidelines;
- Appropriately citing the GenAI tool used;
- Consulting with relevant publication policies;
- Ensuring all collaborators understand the limitations of GenAI and their responsibility in reviewing outputs carefully; and
- Always prioritizing human oversight and critical thinking in the research process.
Although the university does not have a specific policy on the use of GenAI, inappropriate use of the technology may lead to research misconduct as defined in the Iowa State Research Misconduct policy.
The default stance on using GenAI for writing research papers should generally be no – particularly for creative contributions – due to issues around authorship, copyright, and plagiarism. However, GenAI can be beneficial for editorial assistance, provided you are aware of what your target publication deems acceptable.
Generating text and images for publications in scientific journals raises issues of authorship, copyright and plagiarism, many of which are still unresolved. Because this is a rapidly evolving and controversial area, many journals and research conferences have been, and will continue to, update their policies. Again, it’s critical that you carefully review and understand the author guidelines of your targeted journal.
Here are a few examples of recent authorship guidelines.
- Springer Nature journals prohibit the use of GenAI to generate images for manuscripts; texts generated by LLM should be well documented, and AI is not granted authorship.
- Science journals require full disclosure for the use of GenAI to generate text; GenAI-generated images and multimedia can be used only with explicit permission of their editors. AI is not granted authorship.
- JAMA and the JAMA network journals do not allow GenAI to be listed as authors. However, GenAI content or assistance in writing/editing are allowed in manuscripts but should be reported in the manuscript.
- Elsevier permits the use of GenAI tools to enhance text readability but not creating or altering scientific content. Authors should provide full disclosure of the use of GenAI. It prohibits the use of GenAI to generate or alter images, unless this is part of the research method. AI authorship is not allowed.
- IEEE mandates disclosure of all AI-generated content in submissions, except for editing and grammar enhancement.
- The International Conference on Machine Learning prohibits content generated by GenAI, unless it is part of the research study being described.
While direct generation of content by GenAI is problematic, its role in the earlier stages of writing can be beneficial. For instance, non-native English speakers may use GenAI to refine the language of their writing. GenAI can also serve as a tool for providing feedback on writing – similar to a copy editor’s role – by improving voice, argument, and structure. This utility is distinct from using AI for direct writing. Editing help from GenAI is becoming increasingly recognized as acceptable in most disciplines where language is not the primary scholarly contribution as long as the human author assumes full responsibility for the final content. However, conservative editorial policies at some venues may limit the use of such techniques in the near term.
You cannot assume GenAI tools are compliant with rules and laws designed to ensure the confidentiality of private information, such as HIPAA (Health Insurance Portability and Accountability Act of 1996) and FERPA (Family Educational Rights and Privacy Act). Uploading information (e.g., research data, grant proposals, unpublished manuscripts, or analytical results) to a public AI tool is equivalent to releasing it publicly; therefore, before any information from you or another individual is uploaded to a public AI tool, appropriate steps must be taken to ensure that the disclosure of that information is consistent with all rules and laws related to the handling of private information.
As an author or creator, you are right to have concerns about your work being reused without your permission by GenAI technologies. As an Iowa State faculty member, you can protect yourself by using Microsoft Copilot. Unlike other GenAI tools, such as ChatGPT and Bard, Microsoft Copilot includes commercial data protection that complies with the Family Educational Rights and Privacy Act, Health Insurance Portability and Accountability Act, and others.
The tool is included in Iowa State’s Microsoft license at no additional cost. To access it, log in to Microsoft Copilot and authenticate your ISU credentials. Once logged in, users see a confirmation that personal and organizational data are protected in the chat.
Microsoft Copilot adds security for university data. It does not use any data entered by Iowa State employees to build the model and none of the data entered resides in any Microsoft data center after the chat is closed. You can learn more about Microsoft Copilot in this Inside Iowa State article from November 2023.
When publishing your work, review the author’s contract carefully for any clauses that permit the use of your work for text mining and language model training. Keep in mind that academic publishers are increasingly using GenAI-powered chatbots to help scholars find relevant articles, so opting out of text mining may affect the discoverability of your research and negatively affect your impact factor.
Under the Patent Statute, the term “inventor” is defined in 35 U.S.C. 100(f) as “the individual or, if a joint invention, the individuals collectively who invented or discovered the subject matter of the invention” (emphases added). The term “inventor” has been interpreted to be those natural persons who invent or discover the claimed invention. GenAI is not a natural person and, thus, cannot be an inventor. However, that does not mean that a natural person cannot use GenAI to assist in developing an invention.
The U.S. Patent and Trademark Office (USPTO) has issued guidance to help people understand whether they qualify as an inventor of (and thus may own and protect) an AI-assisted invention. The USPTO makes clear that the use of AI in developing the invention does not negate a person’s contributions as an inventor. Nevertheless, documenting the invention process of an AI-assisted invention, including keeping a record of prompts fed to the GenAI system for example, is important for determining inventorship of an GenAI-assisted invention.
Patents may not be obtained for inventions that are already in the public domain. In the U.S., an inventor has one year from publicly disclosing an invention to file for patent protection, and after that one year, the inventor’s invention will become part of the prior art, usable against the inventor in seeking patent protection.
Disclosures may come in many forms, such as publicly using an invention or publishing a paper regarding the invention. The public need not be very large, and the public does not even necessarily need to be aware that an invention was being disclosed in order for a disclosure to bar patentability. If an inventor were to upload their invention to a GenAI system or chat about an invention with a GenAI system, it is possible that such a disclosure could be considered a public disclosure if the GenAI system is operated by a third party.
OpenAI, the creator of ChatGPT, warns their system “may use Content [a user inputs to the service] to provide, maintain, develop, and improve [their] Services . . .” Further, the GenAI tool may incorporate any data provided into their model, which could trigger the release of that data to another party, potentially leading to a third-party disclosure that affects an inventor’s ability to protect his or her invention.
Publishers’ policies constantly evolve, but most require authors to document their use of AI and to properly cite the tools used. Individual journals may have more restrictive policies, so if you are writing for a specific publication, it is advisable to consult that outlet’s guidelines for authors before using GenAI as part of your writing process, especially if you are in the humanities. Bottom line, as the author, you are fully responsible for the content of your manuscript, even those parts produced by an AI tool, and are thus liable for any breach of publication ethics.
The downside risks of using GenAI in the process of writing research grants far surpasses any upside benefit. Keep in mind that you, as principal investigator, sign off on the proposal and promise to do the work if funded, so you are responsible for every part of the proposal content, even if GenAI assisted in the development of that content.
The reasoning is akin to that for writing papers, except that there usually will not be copyright or plagiarism issues. Also, not many funding agencies have well-developed policies yet in this regard.
For example, National Institutes of Health (NIH) does not specifically prohibit the use of GenAI to write grants (they do prohibit use of GenAI technology in the peer review process). However, the agency states that an author assumes the risk of using an AI tool to help write an application, noting “[…] when we receive a grant application, it is our understanding that it is the original idea proposed by the institution and their affiliated research team.” If AI generated text includes plagiarism, fabricated citations, or falsified information, the NIH “will take appropriate actions to address the non-compliance.” (Source.)
Similarly, the National Science Foundation (NSF), in its notice dated December 14, 2023, emphasized its views on the use of GenAI in grant proposal preparation and the merit review process. While NSF acknowledges the potential benefits of AI in enhancing productivity and creativity, it imposes strict guidelines to safeguard the integrity and confidentiality of proposals.
The DOE requires authors to verify any citations suggested by GenAI, due to potential inaccuracies, and does not allow AI-based chatbots like ChatGPT to be credited as authors or co-authors.
Reviewers are prohibited from uploading proposal content to non-approved AI tools, and proposers are encouraged to disclose the extent and manner of AI usage in their proposals. NSF stresses that any breach in confidentiality or authenticity, especially through unauthorized disclosure via AI, could lead to legal liabilities and erosion of trust in the agency. (Source.)
GenAI can offer multiple advantages. It can help you summarize a particular paper, saving time and enabling you to cover a much larger number of publications in the limited time you have. GenAI can also help you summarize literature around certain research questions by searching through many papers. If you do use GenAI in preparing any work that might be published, check with the editor before submitting your work.
However, you should consider several factors that may impact how much you can trust such reviews.
- When GenAI encounters a request about which it lacks information/knowledge, it sometimes “makes up” an answer. This “AI hallucination” is well documented and probably many of us have experienced it. You are responsible for verifying the summaries that GenAI gives you.
- Unlike human researchers, GenAI does not have the ability to evaluate the quality of the published work. Therefore, it will indiscriminately include publications of varying quality, perhaps also many studies that cannot be reproduced.
- A GenAI model has a knowledge cutoff date, so newer publications after the cutoff date will not be included in the responses that it gives you.
- Other types of inaccuracies: GenAI’s effectiveness is based on the training datasets. Even though enormous amounts of training data are now used for GenAI models, there is still no guarantee that the training is unbiased.
No, this is not advised. The National Institutes of Health recently announced that it prohibits the use of generative AI to analyze and formulate critiques of grant proposals. This not only applies to GenAI systems that are publicly available, but also to systems hosted locally (such as a university’s own generative AI), as long as data may be shared with multiple individuals. The main rationale is that this would constitute a breach of confidentiality, which is essential in the grant review process. To use GenAI tools to evaluate and summarize grant proposals, or even let it edit critiques, one would need to feed to the GenAI system “substantial, privileged, and detailed information.” A GenAI system should not be fed that information if you’re not certain how the tool will save, share or use it.
Additionally, expert review relies upon subject matter expertise, which a GenAI system could not be relied upon to have. So, it is unlikely that GenAI will produce a reliable and high-quality review.
In some situations, GenAI can be useful to help you draft a letter or edit your edit your letter to adopt a certain tone and carry it throughout. However, please keep the following in mind:
- You are still fully responsible for everything in the letter because you are still the author.
- You should consider the issue of confidentiality. If there is confidential information in the letter, GenAI should not be used if you’re not certain about what the system will do with the information you feed into it.
- Texts written by GPT tend to sound very generic. This can compromise the value and impact of a letter that typically outlines and reinforces the specific expertise of a colleague or collaborator. If you do use GenAI to develop the letter, be sure that it conveys to the reader the same level of support that the letter would project if you had written it entirely on you own.
The most important factor is which GenAI system (what data, what model, what computing requirements) fits well with your research questions. In addition, there are some general considerations.
Open source. “Open source” describes software that is published alongside the source code for use and exploration by anyone. This is a consideration because most GenAI models are not developed locally by the researchers themselves (as opposed to the usual Machine Learning models). Open-source GenAIs, as well as GenAI systems trained with publicly accessible data, can be advantageous for researchers who would like to fine-tune generative AI models, scrutinize the security and functionality of the system, and improve explainability and interpretability of the models.
Accuracy and precision. When outputs of GenAI can be verified (for example, if it is used in data analytics), you can gauge the efficacy of a GenAI by its precision and accuracy.
Cost. Some models require subscriptions to APIs (application programming interfaces) for research use. Other models may be able to be integrated locally, but also come with integration costs and potentially ongoing costs for maintenance and updates. When selecting otherwise free models, you might need to cover the cost for an expert to set up and maintain the model.
Adobe Firefly — AI-Powered Generative Image Creation
Adobe Firefly enhances creativity by providing new ways to imagine, experiment, and bring ideas to life. Use simple text prompts in over 100 languages to make beautiful images, transform text, play with color, and so much more.
ISU students, faculty, and staff can access Adobe Firefly by visiting firefly.adobe.com; be sure you log into Adobe Creative Cloud with your ISU email once you visit the Adobe Firefly website. If you have any issues logging into Adobe Creative Cloud, or would like to request access, please contact the Solution Center. You can also use Adobe Firefly within various Adobe Creative Cloud apps.
Microsoft Copilot — AI-Powered Chat for Work with Data Protection
Use Copilot to get work done faster, be more creative, or support customers better. All of this can be done with the confidence that individual and business data are protected and will not leak outside the organization. Microsoft Copilot is available to the ISU community with your ISU email address.
Copilot unlocks generative AI for capabilities around asking questions and generating responses from the web. Read more about Copilot in this Inside Iowa State article.
Copilot for Microsoft 365 — Your AI Assistant at Work
- Combine powerful, large language models with your work content and context to take on any task.
- Be more engaged in the meetings you attend and quickly catch up on the ones you miss.
- Summarize long email threads and quickly draft suggested replies.
- Learn more about Copilot for Microsoft 365.
You can fill out a form to request a license for Copilot for Microsoft 365. Work smarter, be more productive, boost creativity, and stay connected to the people and things in your life with Copilot—an AI companion that works everywhere you do and intelligently adapts to your needs.
Github Copilot — For Use in Visual Studio Code
You can fill out a form to request a license for GitHub Copilot for $19/month. This will enable you to utilize GitHub Copilot in Visual Studio Code on Mac, Windows, or Linux operating systems. The GitHub username you provide must be associated with your Iowa State Net-ID.
Submit a Request for AI Tools
Email aifeedback@iastate.edu to inquire about other AI tools that might be available for use at ISU. The university’s AI committee is currently reviewing licensing options for additional products. We may not be able to implement requests due to security, privacy, or accessibility compliance, in line with ISU policies.
The nature of GenAI gives rise to a number of considerations that the entire research community is trying to grapple with. Transparency and accountability about the GenAI’s operations and decision-making processes can be difficult when you operate a closed-source system.
Think about the following carefully and be aware that many other issues might arise:
Data privacy concerns. Data privacy is more complicated with GenAI when using cloud-based services, as users don’t know for certain what happens to their input data and whether it could be retained for training future AI models. One way to circumvent these privacy concerns is to use locally deployed GenAI models that run entirely on your own hardware and do not send data back to the AI provider. An example is Nvidia ChatRTX.
Bias in data. Bias in data, and consequently bias in the AI system’s output, could be a major issue because GenAI is trained on large datasets that you usually can’t access or assess, and may inadvertently learn and reproduce biases, stereotypes, and majority views present in these data. Moreover, many GenAI models are trained with overwhelmingly English texts, Western images and other types of data. Non-Western or non-English speaking cultures, as well as work by minorities and non-English speakers are seriously underrepresented in the training data. Thus, the results created by GenAI are culturally biased. This should be a major consideration when assessing whether GenAI is suitable for your research.
AI hallucination. GenAI can produce outputs that are factually inaccurate or entirely incorrect, uncorroborated, nonsensical or fabricated. These phenomena are dubbed “hallucinations”. Therefore, it is essential for you to verify GenAI-generated output with reliable and credible sources.
Plagiarism. GenAI can only generate new content based on, or drawn from, the data on which it is trained. It is likely, then, that GenAI tools will produce outputs that are similar to the training data, even to the point of being regarded as plagiarism if the similarity is too high. As such, you should confirm (e.g. by using plagiarism detection tools) that GenAI outputs are not plagiarized but instead “learned” from various sources in the manner humans learn without plagiarizing.
Prompt Engineering. The advent of GenAI has created a new human activity – prompt engineering – because the quality of GenAI responses is heavily influenced by the user input or ‘prompt’. There are courses dedicated to this concept. However, you will need to experiment with how to craft prompts that are clear, specific and appropriately structured so that GenAI will generate the output with the desired style, quality and purpose.
Knowledge Cutoff Date. Many GenAI models are trained on data up to a specific date and are therefore unaware of any events or information produced beyond that. For example, if a GenAI system is trained on data up to March 2019, it would be unaware of COVID-19 and the impact it had on humanity, or who is the current monarch of Britain. You need to know the cutoff date of the GenAI model that you use in to accurately assess what research questions are appropriate for its use.
Model Continuity. When you use GenAI models developed by external entities/vendors, you need to consider the possibility that one day the vendor might discontinue the model, which could have a big impact on your research’s reproducibility.
Security. As with any computer or online system, a GenAI system is susceptible to security breaches and attacks. We have already mentioned the issue of confidentiality and privacy as you input information or give prompts to the system. But malicious attacks could be a bigger threat. For example, a new type of attack – prompt injection – deliberately feeds harmful or malicious content into the system to manipulate the results it generates for users. GenAI developers are designing processes and technical solutions against such risks (for example, see OpenAI’s GPT4 System Card and disallowed usage policy). However, as a user, you also need to be aware what is at risk, follow guidelines of your local IT providers, and do due diligence with the results that GenAI creates for you.
Lack of Standardized Evaluations: The AI Index Report 2024 found that leading developers test their models against different responsible AI benchmarks, making it challenging to systematically compare the risks and limitations of AI models. Be wary when models tout confidence in certain evaluation measures, as the measures may not have been fully tested.
The Iowa State University Office of the Vice President for Research acknowledges these sources in the development of this guidance: University of Illinois Urbana-Champaign Generative AI Solutions Hub; University of North Carolina at Chapel Hill Office of the Provost; University of Michigan Institute for Data and AI in Society; Princeton University Library; and Penn State University Senior Vice President for Research AI guidance for researchers.
Generative AI is a rapidly evolving technology. The OVPR considers this FAQ a living document. If you have other questions relating to the use of AI in research that you would like us to consider adding, please email them to vpr@iastate.edu.