Classwork 1

Babynames
Filtering
Author

Diya Bijoy

Published

September 15, 2025

This is my first quarto document

Calling Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(babynames)
library(ggformula)
Loading required package: scales

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

Loading required package: ggridges

New to ggformula?  Try the tutorials: 
    learnr::run_tutorial("introduction", package = "ggformula")
    learnr::run_tutorial("refining", package = "ggformula")
babynames
# A tibble: 1,924,665 × 5
    year sex   name          n   prop
   <dbl> <chr> <chr>     <int>  <dbl>
 1  1880 F     Mary       7065 0.0724
 2  1880 F     Anna       2604 0.0267
 3  1880 F     Emma       2003 0.0205
 4  1880 F     Elizabeth  1939 0.0199
 5  1880 F     Minnie     1746 0.0179
 6  1880 F     Margaret   1578 0.0162
 7  1880 F     Ida        1472 0.0151
 8  1880 F     Alice      1414 0.0145
 9  1880 F     Bertha     1320 0.0135
10  1880 F     Sarah      1288 0.0132
# ℹ 1,924,655 more rows

Filtering baby names

babynames %>% filter(name=="Diya")
# A tibble: 27 × 5
    year sex   name      n       prop
   <dbl> <chr> <chr> <int>      <dbl>
 1  1991 F     Diya      5 0.00000246
 2  1992 F     Diya      5 0.00000249
 3  1995 F     Diya      6 0.00000312
 4  1996 F     Diya      7 0.00000365
 5  1997 F     Diya     12 0.00000629
 6  1998 F     Diya     12 0.00000619
 7  1999 F     Diya     14 0.00000719
 8  2000 F     Diya     18 0.00000902
 9  2001 F     Diya     54 0.0000273 
10  2002 F     Diya     92 0.0000466 
# ℹ 17 more rows

Area data plotting

babynames %>% filter(name == "Diya" | name == "Dia" | name == "Deeya") %>%
  gf_area(n ~ year, color = ~ name) %>% 
  gf_labs(title = "Number of babies named Diya or Dia or Diyaa in the USA",
         x = "Year",
         y = "Number of babies") %>% gf_theme(theme_light())

Point data plotting

babynames %>% filter(name == "Diya" | name == "Dia" | name == "Deeya") %>%
  gf_point(n ~ year, color = ~ name) %>% 
  gf_labs(title = "Number of babies named Diya or Dia or Diyaa in the USA",
         x = "Year",
         y = "Number of babies") %>% gf_theme(theme_light())

Bar data plotting

babynames %>% filter(name == "Diya" | name == "Dia" | name == "Deeya") %>%
  gf_bar(n ~ year, color = ~ name) %>% 
  gf_labs(title = "Number of babies named Diya or Dia or Diyaa in the USA",
         x = "Year",
         y = "Number of babies") %>% gf_theme(theme_light())
Warning: Ignoring unknown aesthetics: .
Warning: The following aesthetics were dropped during statistical transformation: ..
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

babynames %>% filter(name == "Diya" | name == "Dia" | name == "Deeya") %>% filter(year == "1989" | year == "2010") %>%
  gf_point(n ~ year, color = ~ name) %>% 
  gf_labs(title = "Number of babies named Diya or Dia or Diyaa in the USA",
         x = "Year",
         y = "Number of babies") %>% gf_theme(theme_light())