Using AI as a Data Science Research: Changes, Capabilities, and Ethics

October 2, 2024 - 12:00 am

In the last two years, we have seen unprecedented progress in the capabilities of large language models. In the last month, several models have emerged that show advanced reasoning capabilities. In this talk, Dr. John Wallin will focus on the capabilities and impact of the most recent models – particularly OpenAI-o1 Preview, Claude 3.5 Sonnet, and ChatGPT 4o with advanced voice features. He will also talk more in-depth about the “AI Scientist” software released in August. Dr. Wallin will focus on the implications this has on our research in our productivity, workflow, and the ethical implications it has for our work.

Click here to watch the video.

Integrating Health Care Data with HL7: Leveraging Google Cloud and RESTful APIs for Enhanced Interoperability

April 10, 2024 - 12:00 am

In this seminar Dr. Lu Xiong explores advanced data engineering techniques in healthcare data management using Google Cloud’s BigQuery and the HL7 standard. The presentation primarily focuses on ETL (Extraction, Transformation, and Loading) processes and data transformation within healthcare settings, illustrating how raw data is converted into formats suitable for analysis and decision-making. A significant part of the discussion is dedicated to demonstrating how HL7 standards facilitate the effective integration and interoperability of healthcare data. Additionally, it covers practical applications of BigQuery for managing large datasets and the role of Google Cloud’s Dataflow in real-time data processing and predictive analytics.

View the seminar here.

Introduction to Health Information Exchanges (HIE): interoperability in Healthcare and Health Insurance IT

April 10, 2024 - 12:00 am

The seamless exchange of healthcare information, facilitated by Health Information Exchange (HIE), is pivotal for enhancing patient care, improving outcomes, and reducing healthcare costs. Dr. Lu Xiong with MTSU’s Department of Mathematical Sciences, will introduce the intricate landscape of data interoperability within healthcare IT, clearly explaining the complex challenges—from standardization to privacy concerns—that hinder the smooth exchange of data. Explore the relationship between HIE and Electronic Health Records (EHR), look at the basic structures that support HIE, and discuss the roles of essential protocols such as HL7 and FHIR. Additionally, we will consider the operationalization of HIE in various contexts, including its integration with healthcare insurance, funding resources, and ethical considerations surrounding data privacy. An in-depth look at the infrastructure components of HIE and insights into HIE development using C# and Python will provide attendees with practical knowledge. The talk will end with an invitation to engage with the local HIE community, fostering a collaborative environment for future innovation.

View the video here.

Reading the Pseudo-code for Transformers

April 10, 2024 - 12:00 am

Transformers are the underlying algorithms powering large language models like GPT, Bert, and Llama. In this seminar, Dr. Yixiang Wu, with MTSU’s Department of Mathematical Sciences will delve into a paper authored by the DeepMind team, focusing on the formal algorithms that define transformers. To grasp these algorithms, attendees need only a basic understanding of linear algebra (matrix multiplications) and probability theory (conditional probabilities).

See the video here.

Opportunities for actuaries and computational mathematics experts in healthcare industry

March 26, 2024 - 12:00 am

Ramzi Abujamra, PhD, ASA, Sr Clinical & Population Health Analyst, Highmark Health will describe opportunities in the health insurance industry from his viewpoint during his career. Having both a PhD in Health Informatics (UMN) and an ASA has given him a unique combination in this industry. He will describe the work that he has been exposed to so far in the industry. Most common approaches focus on identifying ways to reduce costs. Those approaches include predictive analytics and intervention program evaluations that identify the potential savings that could be achieved. With more recent advances in AI and computational math, more cutting-edge ideas are beginning to shape the industry. The key behind those innovations boils down to adding personalization to the previous iterations. Another key development is also the increasing synchronization happening between provider and payer. The silver bullet is to be able to identify the right treatment for the right patient, at the right time, and for the right duration. There are opportunities for innovations open in this space that are still unchartered territory. We will discuss some of those ideas in more detail and the potential for leveraging cutting-edge research to bring innovation to the evolving healthcare industry.

See the video here, https://youtu.be/LtkcTbvb-jc

Using AI-Assisted Writing Productively

October 18, 2023 - 12:00 am

Notes from the LT&ITC Presentation By John Wallin, MTSU

Of course, parts of this document were generated using ChatGPT.[1]

The essentials:

The “computational” capabilities of ChatBots are growing exponentially. The interaction modes of ChatBots are rapidly changing.
ChatBots are not magical paper-generating machines.
Interact with ChatBots as though they are smart and very productive students who occasionally go in completely the wrong direction.
ChatBots are best at conversations. If you don’t get what you want out of prompt, give it feedback and help it get back on course.
You may not be comfortable with them yet, but this is not a fad.
I believe this is a fundamentally new phase in the evolution of language.

Basic Use Cases for Academics

Interact with ChatBots as though they are smart and very productive students who occasionally go in completely the wrong direction.

Summarizing text – articles, notes, transcripts

Generation of classroom materials

Revision of text

Generation of new ideas and focusing your thoughts

Finding the right tone

Some Hints:

Write good prompts and ask for revisions.

Never use something directly from the output… ever.

Customize your experience with a profile.

Purchase the upgrade – your students will.

Explore the multimodality capabilities of the apps.

Bad Prompts for Academic Writing

Bad Prompt: “Write an essay about climate change.”

Issue: The prompt is broad and could result in an essay that covers any range of topics related to climate change. Without specifics, the content may not align with the intended focus, such as the scientific basis, policy implications, or ethical considerations.
Revised: “Write a 1,500-word essay exploring the impact of anthropogenic climate change on polar ice melting, with a specific focus on its consequences for global sea levels. Cite at least five peer-reviewed articles.”
Revised Prompt – GPT 3.5: “Compose an essay exploring the impact of rising global temperatures on polar ice caps and its consequences for sea level rise and wildlife in the Arctic.”

Bad Prompt: “Summarize the key arguments of this article.”

Issue: Without providing the article or specifying which aspects or sections are of interest, the model can’t produce a meaningful summary.
Revised: “Summarize the key arguments of the article ‘The Social Implications of Artificial Intelligence’ by Jane Smith, focusing on its discussion of job displacement and income inequality. Limit your summary to 300 words.”
Revised Prompt – GPT 3.5: “Write an essay analyzing the influence of artificial intelligence in K-12 education, focusing on how AI-powered learning platforms enhance personalized instruction and student engagement.”

Bad Prompt: “Review this paper for me.”

Issue: Similar to the above, this prompt doesn’t include the paper to be reviewed or specify the criteria for the review (e.g., methodology, literature review quality).
Revised: “Conduct a critical review of the research paper ‘Quantum Computing: Current State and Future Possibilities.’ Focus on evaluating the methodology, the literature review, and the validity of the findings. Produce a review of around 700 words.”
Revised Prompt – GPT 3.5: “Craft an essay that delves into existentialism as a philosophical movement, examining the works of Jean-Paul Sartre and Albert Camus and their perspectives on the absurdity of human existence.”

Bad Prompt: “Analyze the data.”

Issue: This is too vague. Without knowing what kind of data, what the research question is, or what kind of analysis is required (e.g., statistical, qualitative), the model can’t proceed meaningfully.
Revised: “Analyze the dataset on COVID-19 infection rates in Tennessee from January to June 2020 using a linear regression model. Discuss how well the model fits the data and potential implications.”
Revised Prompt – GPT 3.5: “Conduct a statistical analysis on the dataset related to the impact of social media on teenage mental health, specifically exploring the correlation between daily screen time and reported emotional well-being among adolescents aged 13 to 18.

Bad Prompt: “Give feedback on student essays.”

Issue: Without the essays or guidelines for what to look for (e.g., content accuracy, writing quality), the model can’t provide targeted feedback.
Revised: “Provide feedback on student essays about the ethical considerations of gene editing. Use a rubric that assesses thesis clarity, evidence quality, and argumentative structure. Limit feedback to one paragraph per criterion.”
Revised Prompt – GPT 3.5: “Provide constructive feedback on a set of three student essays regarding the historical significance of the American Civil Rights Movement. Focus on clarity of argument, evidence, and use of primary sources, and suggest improvements in organization and critical analysis.”

Bad Prompt: “Discuss the limitations of the study.”

Issue: Without information about the study in question, its methodology, or its findings, a meaningful discussion of its limitations can’t be provided.
Revised: “Discuss the limitations of the study ‘Impacts of Remote Learning on Academic Performance’ by focusing on sample size, methodology, and the scope of the research. Produce a discussion of around 400 words.”
Revised Prompt – GPT 3.5: “Conduct a critical analysis of the limitations in a research study titled ‘The Impact of Social Media on Political Polarization.’ Specifically, evaluate the study’s methodology, sample size, and potential sources of bias, and suggest areas where future research could address these limitations.”

Bad Prompt: “Write an introduction for my paper.”

Issue: Without details on the paper’s topic, objectives, and key arguments, the model can’t craft an appropriate introduction.
Revised: “Write an introduction for a paper investigating the effectiveness of augmented reality in teaching astronomy. The introduction should provide background, state the research question, and outline the methodology.”
Revised Prompt – GPT 3.5: “Compose an engaging introduction for my research paper on the ‘Impact of Artificial Intelligence on the Labor Market,’ summarizing the significance of the topic, outlining the research objectives, and providing a clear thesis statement that highlights the main arguments to be explored in the paper.”

Bad Prompt: “What should I include in my presentation?”

Issue: This lacks specifics on the subject, audience, and purpose of the presentation, making it difficult to provide a tailored list of content to include.
Revised: “Outline the key points to include in a 20-minute presentation about the challenges and opportunities of AI in higher education, targeting an audience of interdisciplinary faculty.”
Revised Prompt – GPT 3.5: “Please provide guidelines on the key elements to include in a 20-minute academic presentation on ‘The Role of Renewable Energy in Sustainable Development.’ Consider the target audience (e.g., undergraduate students) and suggest content like an introduction, key findings, case studies, and potential discussion points.”

Bad Prompt: “Create questions for my quiz.”

Issue: Without knowing the subject, difficulty level, or format (multiple-choice, short answer, etc.), the model can’t create appropriate quiz questions.
Revised: “Create 10 multiple-choice questions to assess understanding of neural networks in data science, focusing on architecture, training algorithms, and applications.”
Revised Prompt – GPT 3.5: “Generate ten multiple-choice questions for an introductory biology quiz covering topics such as cellular respiration, mitosis, and DNA structure. Ensure a mix of difficulty levels and provide explanations for both the correct and incorrect answers.”

Bad Prompt: “Help me with citations.”

Issue: This doesn’t specify what citation style is needed (APA, MLA, etc.) or what exactly needs to be cited (a particular passage, data, etc.).
Revised: “Format the in-text citations and bibliography for a paper on crowd-sourced data sets, adhering to the APA 7th edition guidelines.”
Revised Prompt – GPT 3.5: “Assist me in formatting in-text citations and a bibliography in the APA style for a research paper on ‘The Psychological Effects of Social Media Use in Adolescents.’ Specifically, provide guidance on citing online sources and properly formatting reference entries.”

Bad Prompt: “Discuss the impact of technology on education.”

Issue: Similar to the first example, this prompt is overly broad and lacks context. It doesn’t specify the type of technology, the educational level, or the desired angle of discussion.
Revised: “Write a 1,000-word essay discussing the impact of e-learning platforms like Moodle and Blackboard on student engagement in higher education. Use at least three academic sources.”
Revised Prompt – GPT 3.5: “Write an essay analyzing the influence of tablet-based interactive learning platforms on elementary school education, with a focus on how these technologies enhance engagement and math proficiency in young students.”

Bad Prompt: “Write an essay on philosophy.”

Issue: Without specifying the branch of philosophy or a particular philosophical question, this prompt is too vague to yield a well-structured and informative essay.
Revised: “Write an essay of 2,000 words examining the ethical considerations of utilitarianism in modern healthcare. Refer to the works of John Stuart Mill and Peter Singer.”
Revised Prompt – GPT 3.5: “Craft an essay that explores the ethical dilemmas posed by utilitarianism in the context of environmental conservation, examining the works of philosophers like John Stuart Mill and their views on the moral implications of maximizing utility.”

Bad Prompt: “Write an essay on the impact of social media.”

Issue: Similar to other prompts, this one lacks specificity. It doesn’t mention which social media platform, the target audience, or the specific impact to explore.
Revised: “Write an essay of 1,800 words discussing the impact of social media platforms like Facebook and Twitter on political polarization in the United States.”
Revised Prompt – GPT 3.5: “Compose an essay examining the influence of Instagram on body image perception among adolescent girls, focusing on the role of curated content and image filters in shaping self-esteem and beauty ideals.”

Bad Prompt: “Examine the role of technology in healthcare.”

Issue: This prompt doesn’t clarify the type of technology (e.g., AI, telemedicine, EHR systems) or the specific aspect of healthcare (e.g., patient care, administration, research).
Revised: “Write a 2,000-word essay examining the role of telemedicine during the COVID-19 pandemic, focusing on its impact on rural healthcare access.”
Revised Prompt – GPT 3.5: “Write an essay investigating the impact of artificial intelligence in patient diagnosis and treatment planning within the field of radiology. Analyze the effectiveness and limitations of AI-driven imaging interpretation in improving medical outcomes.”

Bad Prompt: “Discuss the benefits of exercise.”

Issue: The prompt is too broad and doesn’t specify the context or type of exercise. Without clear instructions, the response might not address the intended topic comprehensively.
Revised: “Write a 1,200-word essay discussing the cardiovascular benefits of aerobic exercise, referencing at least three peer-reviewed studies.”
Revised Prompt – GPT 3.5: “Write an essay outlining the cardiovascular benefits of aerobic exercise for individuals over 50 years old, with a focus on its role in improving heart health, reducing the risk of hypertension, and enhancing overall longevity.”

Customizing GPT

What would you like ChatGPT to know about you to provide better responses?

I have a Ph.D. in astrophysics, and I currently run the Computational and Data Science Ph.D. at Middle Tennessee State University in Murfreesboro, Tennessee. In my job, my research focuses on applying neural networks to crowd-sourced data sets from the Zooniverse. I am also leading an effort in augmented reality – specifically investigating the effectiveness of wearable AR (Magic Leap units) to teach astronomy and chemistry. I am the lead on an NSF grant on this project.

My PhD program has about 35 doctoral students. Although my home department is Physics and Astronomy, I am associated with the Data Science leadership group. Our campus has undergraduate and master’s degrees and my doctoral program. The programs, particularly the PhD, are highly interdisciplinary, with faculty members from across the STEM departments.

I am very interested in the applications of AI in higher education. I am particularly interested in how we address the challenges it presents to our curriculum. For example, automatic code generation directly impacts how we teach our coding course. Automatic text generation can affect how we write and evaluate student work. I will be leading faculty workshops this fall to discuss these issues.

In my personal life, I am married to Katharine. We have three cats. I like to build things and have a large maker space in my home with a wood shop.

How would you like ChatGPT to respond?

My name is John Wallin, but I prefer to be addressed as John. I think longer answers are often more helpful, but this might be situational. I am very comfortable with technical language. I think opinions are helpful, but it is helpful to state when you are expressing an opinion vs a specific fact. When writing revisions, it will be better to just show me the revised paragraphs instead of the entire document. If possible, I would prefer accurate references. However, any inaccurate information would be damaging to my reputation. I would rather have no references rather than inaccurate references. I view you as a collaborator – something close to a graduate student. You have high abilities and intelligence but can’t read my mind. If you are unclear about what I am asking for, please ask for clarification. I want to make sure we have clear communications.

Modes for using GPT-4

“Our most capable model, great for tasks that require creativity and advanced reasoning.”

Default – Only uses information from its training set. It does not have access to external data.

Browse with Bing – Gives the system access to the internet. Example Prompts:

What is the weather currently in Murfreesboro Tn and what is the forecast this week?
What are the five top news headlines?
Can you summarize the conflict in Gaza this week based on news reports?

Advanced Data Analysis – Allows the user to import and analyze CSV format data files. Example prompt:

This data set is a snapshot of the grades from my class. I need to calculate the midterm grades.

Rename the columns.
1. All the columns listed as HXX.. are homework grades. Let’s rename them to just “HXX” where XX is the number of the homework.
2. All the columns that start with LXX… shorten to LXX where XX is the lecture number.
3. Shorten the names of the exam columns the same way.
Create a new column called HWSUM that is the sum of values in the homework columns excluding the lowest two values. Find the maximum value of this column, then create a new column called HW NORM that uses the sums of the each individual so it is normalzed (based on the maximum) to 15 points.
Repeat this with lecture scores – summing all but the lowest two scores. The new columns will be called LECSUM and LEC NORM. The normalized values will also be out of 15 points using the maximum value of LECSUM.
Create an EXAMSUM only using the top two scores, dropping the one lowest score. Normalize this into a column called “EXAM NORM” that is out of 70 pts.
Create a new coluimn called TOTAL PTS that is the sum of the Normlized columns.”
Create a new column called “GRADE” that is an A,B,C,D or F grade using the normal 90, 80, etc. grading.
Show me the grade distribution.
Allow me to download the file.

Image analysis:

Circuit board.jpg

Sample Prompt: What does this circuit do?

Response: The image you provided shows a printed circuit board (PCB) with various electronic components. Based on the visible components and labels, here’s an analysis:

…

From the visible components and markings, I’d hypothesize this circuit has something to do with control systems, possibly a PID controller or some signal processing related to it. The presence of terms related to PID control combined with the ICs suggests it might be processing input signals and generating a control output. However, to confirm the exact functionality, a schematic or more detailed information would be necessary.

Note: This is a broad analysis based on the visual information provided, and some components or connections might not be visible in the image. If you can provide more context or a clearer image, I might be able to give a more accurate assessment.

DALL-E 3 – Art Generation from a Text Prompt

Sample Prompt: Make me an image of a Cat astronaut floating in space. It should be pawing at a ball of space yard that is slowly driving away from it. The cat is trying to recover her precious toy before it floats away. This should be a digital art image. The curved edge earth should be in the background. You should also be able to see the cat’s spacecraft.

Plug-ins – 3^rd party additions to ChatGPT to perform specific tasks.

Example Plugins:

AskYourPDF – Allows you to load and query PDF documents
Paper Chat – search through the ArXIV preprint archive
Diagrams: Show Me- create flowcharts, schema, etc.
ScholarAI – access 200M+ scholarly publications
LinkReader – read any URL including text, video, images
Wolfram – use Mathematical abilities of Wolfram Alpha

Voice Interface

Talk directly to GPT via voice input and output.

[1] OpenAI. (2023). ChatGPT (September 25 Version) [Large language model]. https://chat.openai.com

ChatGPT Vision vs Real World

October 13, 2023 - 12:00 am

Dr. John Wallin, Director of Computational and Data Science and Professor of Astronomy

Introduction

I finally got access to the Vision options within ChatGPT. This is a blog post announcing its roll out.

https://openai.com/blog/chatgpt-can-now-see-hear-and-speak

This feature was enabled on my phone yesterday, so I gave it a series of challenges to see how well it works. The results were impressive, and a bit disturbing. Each section represents a picture of something and a single-line prompt to explain it. The answers were written by GPT.

ChatGPT had correct answers for each prompt. It has a harder time with the ancient Greek transcription and had some errors in the characters it presented. However, I have spent the last 9 years working on this as a research problem developing a highly trained set of neural networks in collaboration with my graduate student Matthew and others. ChatGPT did the same tasks without any additional information when traditional OCR models are basically useless with this kind of damaged manuscript. The artistic interpretation of the diagram was also shocking. It figured out the visual language being used when I could not figure out what it was doing.

Each case has a title, a short description of the image context that was not shared with ChatGPT, the image, the short prompt I typed, and the results from ChatGPT explaining what it saw.

Introduction

Case Studies

Understanding the purpose of a circuit board – Note: the board name has been inked out so it can’t cheat

Solving problems from Physics 4310 based on a picture of the homework

Understanding a circuit program

Interpreting Ingredients from a cereal box

Reading a technical diagram

Reading an Ancient Greek Papyri

Interpreting a Highly Artistic Image of a Scientific Diagram

Conclusions

Case Studies

Understanding the purpose of a circuit board – Note: the board name has been inked out so it can’t cheat.

Context not shared with ChatGPT: I found this image on a website and then make sure the name of the circuit board was not visible.

PROMPT: What does this do?

The image you provided shows a printed circuit board (PCB) with various electronic components. Based on the visible components and labels, here’s an analysis:

Capacitors: The large white cylindrical components labeled “2.0 MFD ±10% 50 VDC” are capacitors. They store and release electrical energy. Their presence can indicate several functionalities like filtering, decoupling, or timing elements.
Resistors: The colorful striped components are resistors. They limit or set the current flow through a particular path in a circuit.
Integrated Circuits (ICs): There are two ICs on the board. These could be operational amplifiers, microcontrollers, or some other kind of specialized ICs. Without the exact part numbers or a clearer view, it’s hard to determine their exact functionality.
Diodes: I see at least one component that looks like a diode (black cylindrical component with a silver line). Diodes allow current to flow in only one direction and are often used for rectification, protection, or indication.
Jumpers/Headers: Labeled as “Int+”, “Int-“, “Output”, etc., these could be connections for inputs, outputs, or configuration settings.
Potentiometers: Not clearly visible, but the “Bias”, “Kp”, “Ki”, and “Kd” markings may indicate adjustable potentiometers or placeholders for them. These terms are related to Proportional-Integral-Derivative (PID) control, which is a method used in control systems to make a system behave in a desired manner. “Kp”, “Ki”, and “Kd” stand for the proportional, integral, and derivative gains, respectively.

Solving problems from Physics 4310 based on a picture of the homework.

Context not shared with ChatGPT: I took this image with my phone from a PHYS 4310 (E&M) homework assignment. I had previous had it solve part A with a similar image.

Physics 4310.jpg

PROMPT: Solve this problem

sure john.png

Understanding a circuit diagram

Context not shared with ChatGPT: This is an circuit diagram of the circuit board shown above. I shared this with ChatGPT in a separate thread, so it couldn’t use the previous context to understand the image.

PROMPT: What does this do?

The schematic you’ve provided appears to be an analog circuit, specifically an implementation of a Proportional-Integral-Derivative (PID) controller using operational amplifiers (op-amps, labeled as TL084). Let’s break down its components and functionality:

Proportional (P) Component: This is represented by the op-amp labeled “prop”. It takes the difference between the input signal and a reference (usually set to 0V or ground for simplicity), amplifying it. The amplification factor is set by the resistors around it. This provides an output that’s directly proportional to the error.
Integral (I) Component: This is represented by the op-amp labeled “int”. It integrates the error over time, essentially providing a measure of the accumulated error. If a system error has persisted for an extended period, the integral term builds up (increases), and the controller responds by changing the control input in relation to a consistent error magnitude. The capacitor Cint is essential for the integration action.
Derivative (D) Component: This is represented by the op-amp labeled “der”. It predicts system behavior and thus provides a control output to counteract the rate of error change. The main purpose of the derivative term is to provide damping or shaping of the controller’s response. The capacitor Cder is responsible for the differentiation action.
Summing: The outputs from the P, I, and D components are summed together to form the final output (Vout). This summed signal is the control signal that is sent to whatever system is being controlled.
Biasing and Protection: There are various other components in the circuit, like diodes (e.g., D1), which can be for protection or to introduce non-linear behavior, and resistors like R_bias, which might be used for biasing purposes.
Power Supply: The op-amps require positive (+V) and negative (-V) power supplies. These power rails ensure the op-amps can output the necessary voltage levels for the system they’re controlling.

In summary, this circuit is a PID controller implemented in analog electronics. PID controllers are used in various applications to control processes, ensuring they stay at a set point or follow a desired trajectory. The PID controller works by adjusting its output based on the difference between a desired set point (reference) and the actual system state (input).

Interpreting Ingredients from a cereal box

Context not shared with ChatGPT: I took this picture of cereal box at breakfast this morning.

PROMPT: Is this healthy?

The term “healthy” can be subjective, but let’s break down the nutritional information of this product to provide an informed opinion:

Calories: A serving without milk provides 150 calories, and with skim milk, it’s 210 calories. This amount of calories is typical for many breakfast cereals.
Fats: The cereal contains no saturated, trans, polyunsaturated, or monounsaturated fats. This is a positive aspect as saturated and trans fats are often linked to heart diseases.
Cholesterol: There’s no cholesterol in the cereal alone, but there is a minimal amount when consumed with milk.
Sodium: The cereal has 300mg of sodium per serving, which is 13% of the Daily Value. A diet high in sodium can contribute to high blood pressure, so it’s essential to monitor total daily intake.
Carbohydrates: There are 36g of carbohydrates in the cereal alone. Of those, 3g come from dietary fiber and 14g from sugars. A diet rich in fiber can aid in digestion, while excessive sugar consumption can lead to various health issues, including weight gain and increased risk of heart disease.
Proteins: The protein content is 3g without milk and 9g with milk. This isn’t a significant source of protein compared to other foods.
Vitamins and Minerals: The cereal seems to be fortified with various vitamins and minerals, including Vitamin D, iron, potassium, thiamin, riboflavin, niacin, vitamin B6, folate, and vitamin B12. These are beneficial for overall health.
Ingredients: The primary ingredients are milled corn, sugar, and malt flavor. It’s worth noting that sugar is a prominent ingredient, which could be a concern if one is trying to limit sugar intake.

In conclusion, this cereal has some nutritional benefits due to its vitamins and minerals, and the lack of unhealthy fats. However, its relatively high sugar content and moderate sodium level are areas of concern. Like many products, consuming this in moderation and as part of a balanced diet is key. If you have specific health goals or concerns, it’s always best to consult with a nutritionist or healthcare professional.

Reading a technical diagram

Context not shared with ChatGPT: This diagram was taken from a paper I am writing on our Augmented Reality system to teach STEM classes. The diagram was generated by my grad student Isaac yesterday, so this could not have been in its training set.

PROMPT: Explain this to me

The image you’ve provided appears to be a flowchart outlining the execution flow for some sort of virtual lab environment. Let me break down its main components:

User Login and Lab Placement: This seems to be the starting point for users. They would first log in, and based on certain criteria or selections, they are placed in a specific lab.
Server Interactions:

Ask for Lab Files: Once logged in, the system communicates with the server to request the necessary files for the lab.
Lab Assets: These are the components that make up the lab environment, such as multimedia files, 3D models, or any other assets.
Upload Logs: It seems the user’s interactions or activities within the lab are logged and then uploaded to the server. This could be for tracking progress, errors, or for analytical purposes.
Build Lab: After retrieving the necessary assets, the server initiates the building or rendering of the lab for the user.
Return to Lab, Select, or Exit: After completing a task or module, the user has the option to return to the main lab, select a different lab, or exit the environment.

Lab Construction:

Modules: Each lab is composed of multiple modules. In the diagram, these are represented as ‘Module 1’ and ‘Module X’, indicating that there can be any number of modules in between. Each module seems to define a specific activity.
Clips: Within each module, there are ‘clips’. These clips might represent specific tasks, instructions, or components of the module. They seem to contain audio instructions, 3D models, and interactive elements. The clips appear to guide the user through specific changes or interactions in the scene.

In essence, this flowchart provides a high-level view of how a user interacts with a virtual lab system, from login to selecting and navigating through lab modules, and how the server supports this interaction by providing necessary assets, building the lab environment, and tracking user activity.

Would you like a more detailed analysis or have any specific questions about any components?

Reading an Ancient Greek Papyri

Context not shared with ChatGPT: This image was again taken from some of my research. With collaborators and my grad student Matthew, we created and trained neural networks to do this work. It just read the damn thing with no help. It didn’t do nearly as good of a job as our software, but normal OCR program just choke on this kind of problem.

PROMPT: What does this say?

This appears to be an ancient or medieval fragment, possibly written in Greek given the style of the characters. From the image provided, deciphering the exact content is challenging due to the deteriorated condition and fragmented nature of the text. However, I can attempt to identify some of the visible words or phrases based on the Greek letters present:

The characters “ΔΙΟΣ” can be seen, which might refer to “ΔÎ¯ος” (Dios), meaning “of Zeus.”
The characters “ΛΟΓΟΣ” might be present, translating to “Logos,” which can mean “word,” “reason,” or “argument,” among other definitions.
I can also spot characters that might be part of the word “ΑΝΘΡΩΠΟΣ,” meaning “human” or “man.”

It’s important to note that without a complete and clearer image, and without context, the exact meaning and interpretation of this fragment are speculative. If you want a precise translation and interpretation, consulting with a specialist in ancient Greek texts would be essential. If you’re interested in using AI to decipher such texts, it might be an exciting area to integrate your expertise in neural networks and crowd-sourced data sets.

PROMPT: Can you transcribe as many characters as possible. Use a . for characters where data is missing or it isn’t clear enough to read.

Of course, John. I’ll do my best to transcribe the visible characters, using a “.” for those that are missing or unclear:

..Π..Π…ΔΙ…

..ΛΟ.ΓΟΥΣ..

…Ε…ΣΩ…

..ΓΚΟΡΟΣ..

ΠΟΛΥΤ…Ο…

..ΝΤΕΡΥ…

ΑΝΤΙΟΝ…

ΡΙΟΥΚΙ..Please note that the transcription is highly tentative due to the fragmentary and deteriorated nature of the text. Some of the characters might not align perfectly with the original intent of the writer. If you have any specific goals with this transcription, I would recommend consulting with an expert in ancient Greek epigraphy or paleography to ensure accuracy

Interpreting a Highly Artistic Image of a Scientific Diagram

Context that was not shared with ChatGPT. This diagram was generated by Dalle-3 as an artistic version of a Hertzsprung-Russell diagram. It is simply banana. I asked the generator to “go wild”, and it came up with this. I couldn’t understand what the heck was trying to do with this image, but it could explain what its other artistic brain created.

PROMPT: Can you interpret this highly artistic version of this scientific diagram? What is the underlying idea and what scientific chart may have inspired it?

Certainly, John.

The artwork is a vivid and imaginative representation of some aspect of the cosmos, presented as if it were a theater performance. Given the elements in the image and their arrangement, the art seems to be drawing from astronomical themes, potentially inspired by a Hertzsprung-Russell (HR) diagram, or some other celestial classification system.

Here’s an interpretation:

Theater Setting: The cosmos is being presented as a theater show, suggesting the grand spectacle of the universe, with its various elements playing their respective roles in the grand drama of existence.
Planets and Stars: The central stage has representations of various celestial bodies, from planets to stars. This could be indicative of the lifecycle of stars and planets or their relative positions and significance.
Mystical Creatures and Labels: Creatures like dragons, a large-mouthed beast, and other entities can be seen hovering above specific celestial objects. These creatures, combined with the labels, might be trying to represent specific astronomical or mythological concepts. The labels, such as “White Drars”, “Star”, “Gadts”, and “Superions”, could represent types or classifications of stars or celestial phenomena, though they don’t correspond directly to standard astronomical terminology.
Audience: The audience, attentively watching the show, might represent us, the observers on Earth, trying to understand the vastness and complexity of the universe from our limited perspective.
Stage Characters: The figures on stage, dressed in elaborate robes, could be symbolic of guiding forces or entities that play a role in the cosmic dance. They might also symbolize constellations or groupings of stars.
Hertzsprung-Russell (HR) Diagram Inspiration: The placement and categorization of various stars and celestial objects bring to mind the HR diagram, which classifies stars based on their absolute magnitude and spectral type. In an HR diagram, stars are categorized from hot to cool (usually on the x-axis) and from more luminous to less luminous (usually on the y-axis). The imaginative categories in this artwork could be an abstract representation of that.

In essence, the image seems to be an artistic blending of the science of astronomy with mythological and imaginative elements, presenting the universe as a grand theater for us to observe and understand. The underlying idea might be to highlight the beauty, complexity, and interrelation of various celestial elements while emphasizing our role as observers trying to make sense of it all.

Conclusions

Dang – this is powerful. I have no idea what the implications of this are for society or for higher education. I am not often speechless, but this just floored me.

Computational and Data Science Seminar September 15, 2023 with Ph.D. students Momina Liaqat and Richard Hoehn

October 5, 2023 - 12:00 am

Channel boosting-based object detection and segmentation for cancer analysis in histopathological images.

Momina Liaqat

Cancer is one of the most frequent and lethal diseases on the planet. Lymphocytes are thought to be a cancer sign because they concentrate near the location of tumors as a result of the immune system’s reaction. The identification and then quantification of lymphocytes are crucial in evaluating the course of cancer and the efficiency of treatment. On the other hand, a detection system based on machine learning techniques faces a variety of difficulties, including unavailability of annotations lymphocytes aren’t always represented well, and the presence of irregular stains and artifacts that provide the false appearance of lymphocytes on tissues. All these contribute to making lymphocyte detection a difficult task. Also, manual detection and identification takes a lot of time for the pathologists to carefully examine each whole slide image. Manual tasks can also be prone to errors because of human subjectivity. These shortcomings raise the need for a digitally automated system that can lessen the burden on pathologists and perform the detection task with good accuracy. In this study, the goal is to create an automated lymphocyte identification system in histopathology that can overcome all the problems we face with conventional detection techniques.

To achieve automated lymphocyte detection, the Channel Boosting idea is exploited to enhance the feature space by using different feature extractors as auxiliary learners. The dataset used in this study is taken from Grand Challenge and the dataset has a total of 20,000 instances. Because of the good performance of the proposed model which was evaluated based on the F-score, this model may assist pathologists in automated detection of lymphocytes.

Improving Emotion Detection Through Translation of Text to ML Models Trained in Different Languages

Richard Hoehn

This research paper investigates enhancing Emotion Detection (ED) by translating extended text data into various Machine Learning (ML) models trained in distinct languages. We focused on English and German text data to enhance prediction accuracy, aiming to overcome challenges arising from limited labeled datasets and language fragmentation in ED research.

Expanding an original English dataset with translated German data increases the training data’s volume, potentially improving prediction rates in ED applications. Additionally, translating English to German to extend the German dataset and accessing in real-time ML models trained in both could further improve prediction rates.

For presentation purposes, datasets in both English and German were collected, parsed, cleaned, and translated. Multiple ML models trained in English and German where built, and made accessible via an API for predictions in either language including real-time translation using a GET method.

The research findings suggest that the extension of datasets through translation has not yielded improvements in predictive accuracy for both English and German languages. Modest enhancements could potentially be achieved by concurrently accessing English and German models through real-time translation via the RESTful API; however, the benefits may not fully justify the efforts.

We posit that the prevalence of numerous classes in this multi-class classification model has contributed to instances of overfitting across several labels/classes. This occurrence, in turn, has led to a memorization effect rather than facilitating genuine learning within the models.

In conclusion, it becomes evident that a more substantial accumulation of data is required, or an innovative approach involving the utilization of AI to generate analogous data must be employed to comprehensively address this research question.

Watch the seminar here.

Computational and Data Science Seminar from September 22, 2023 with Dr. Vishwas Bedekar and Dr. Qiang Wu

October 5, 2023 - 12:00 am

Overview of Design and Development of Energy Harvesting Systems at MTSU

Vishwas N. Bedekar*, Department of Engineering Technology, Middle Tennessee State University

The diminishing conventional energy sources based on fossil fuels have prompted research and advancement in renewable and non-conventional energy sources. Some of these energy sources that’s abundantly available are mechanical vibrations, solar energy, wind and fluids. In this presentation, an overview of energy harvesting devices and systems using various mechanisms such as piezoelectric effect, magnetoelectric effect and dynamo effect will be investigated. Although the actual energy output is small from these energy harvesters, it provides a viable alternative to provide energy to on-board small scale electronic devices. The presentation will also highlight research on broadband energy harvesting systems and their applications. This research will provide the foundation to the research community to further investigate multimodal and multi-mechanism energy harvesting solutions.

The Master’s Degree in Data Science at MTSU

Dr. Qiang Wu*, Department of Mathematical Science, Middle Tennessee State University

In this presentation, Dr. Wu will discuss the MS program in Data Science. In particular, he will explain how you can pursue this degree while you work on your PhD in Computational and Data Science. He will also discuss some of his research work, including his work on the mathematical foundations of machine learning.

Watch the seminar here.

Generative AI for Students and Scholars – Use cases, limitations, and ethics

October 5, 2023 - 12:00 am

Generative AI offers the potential to transform how students and scholars do their work while simultaneously creating challenges to our current model for both education and scholarship. This talk will discuss example use cases where we can use AI ethically to increase productivity, from processing data sets to working through technical challenges. We will highlight some of the newer features in ChatGPT, including Plugins, Advanced Data Analysis, and customization, and focus on the best approaches toward being productive with these new tools. At the same time, we will discuss the scholarly response to these tools both in the classroom and in the broader scientific community.

Check out the seminar here.

Of course, parts of this document were generated using ChatGPT.[1]

The essentials:

Basic Use Cases for Academics

Interact with ChatBots as though they are smart and very productive students who occasionally go in completely the wrong direction.

Some Hints:

Bad Prompts for Academic Writing

Customizing GPT

Modes for using GPT-4

Image analysis:

DALL-E 3 – Art Generation from a Text Prompt

Voice Interface

Introduction

Table of Contents

Introduction

Case Studies

Understanding the purpose of a circuit board – Note: the board name has been inked out so it can’t cheat

Solving problems from Physics 4310 based on a picture of the homework

Understanding a circuit program

Interpreting Ingredients from a cereal box

Reading a technical diagram

Reading an Ancient Greek Papyri

Interpreting a Highly Artistic Image of a Scientific Diagram

Conclusions

Case Studies

Understanding the purpose of a circuit board – Note: the board name has been inked out so it can’t cheat.

Solving problems from Physics 4310 based on a picture of the homework.

Understanding a circuit diagram

Interpreting Ingredients from a cereal box

Reading a technical diagram

Reading an Ancient Greek Papyri

Interpreting a Highly Artistic Image of a Scientific Diagram

Conclusions