2023-10-03
Research topic 1: Ecology and evolution of ferns
Research topic 2: Development of software for data science
Born and raised in California
Fourth generation Japanese-American
First came to Japan as high school exchange student
Answer the question: “Why are you interested in data analysis?”
Introduce yourself and discuss with your neighbor
https://www.odelama.com/data-analysis/
Obtaining insight from data
Important for many careers (academic and industry)
Employment of data scientists is projected to grow 35% from 2022 to 2032, much faster than the average for all occupations.
Who has used Excel? Who has used a programming language?
What are the advantages and disadvantages of each for data analysis?
It takes some time to get used to, but eventually you will feel more comfortable with it because you can re-trace your steps and have confidence in your results.
When might you want to repeat an analysis? Why?
If new data comes in and you need to update an analysis
If you want to double-check your own results
If you want to repeat somebody else’s analysis
If you switch between different projects and can’t remember exactly what you were doing
The goal of this class is to learn the fundamentals of reproducible data analysis by doing it yourself.
By the end of the course, you will be able to:
I expect you to participate in discussions
I expect you to ask questions
This class is conducted in English
But, you can ask questions in Japanese and I will explain in Japanese if needed
R for Data Science. https://r4ds.had.co.nz/
Happy Git with R. https://happygitwithr.com/
Introduction to Reproducible Publications with RStudio https://ucsbcarpentry.github.io/Reproducible-Publications-with-RStudio-Quarto/index.html
There will be a homework assignment on GitHub for each class, starting next week.
You submit the assignment by making a commit in Git (more about this on Day 2)
You will need to analyze a dataset of your own choosing for your final project, due 2023-11-20
The last homework assignment is due 2023-11-06, so you have at least 2 weeks to work on the final project
No late submissions allowed (exceptions may be made for medical emergencies)
Assignments (GitHub classroom repos) will be posted on Moodle
Check Moodle every week
By appointment: contact me at joelnitta@chiba.u-jp
Who has used ChatGPT before?
You may use ChatGPT for your homework and final project
But first you need to know how to use it
ChatGPT makes statistical predictions about words based on training data (it does not “think”)
ChatGPT is designed to produce sentences that sound as natural as possible
ChatGPT may lie to you or make up facts (called “hallucination”; this is especially common when it lacks adequate training data)
Do try by yourself first (without ChatGPT)
Do ask it detailed, specific questions (prompts)
Do double-check the results: does ChatGPT’s code produce the expected result?
Do make sure you understand the code that ChatGPT provides
We will follow instructions for Day 2 to set up Git