Package 'lgrdata' reference manual

Title:	Example Datasets for a Learning Guide to R
Description:	A largish collection of example datasets, including several classics. Many of these datasets are well suited for regression, classification, and visualization.
Authors:	Remko Duursma [aut, cre], Jeff Powell [ctb]
Maintainer:	Remko Duursma <[email protected]>
License:	CC0
Version:	0.1.1
Built:	2025-02-16 04:45:54 UTC
Source:	https://github.com/remkoduursma/lgrdata

Allometry

Description

This dataset contains measurements of tree dimensions and biomass. Data kindly provided by John Marshall, University of Idaho.

Usage

allometry
allometry

Format

A data frame with 63 rows and 5 variables:

species: factor The tree species (PSME = Douglas fir, PIMO = Western white pine, PIPO = Ponderosa pine).
diameter: double Tree diameter at 1.3m above ground (cm).
height: double Tree height (m).
leafarea: double Total leaf area (m2)
branchmass: double Total (oven-dry) mass of branches (kg).

Examples

data(allometry)
with(allometry, plot(diameter, height, pch=19, col=species))
data(allometry)
with(allometry, plot(diameter, height, pch=19, col=species))

Child anthropometry

Description

Data include measurements of age, foot length, and height for 3898 children. These data are a small subset of many dozens of measurements on the same children, described in detail by Snyder (1977).

Usage

anthropometry
anthropometry

Format

A data frame with 3898 rows and 4 variables:

age: double Age in years
gender: integer "female" or "male"
foot_length: integer Total foot length (mm)
height: double Total height (cm)

Source

<http://mreed.umtri.umich.edu/mreed/downloads.html>.

Examples

data(anthropometry)
with(anthropometry, plot(age, foot_length, pch=16, cex=0.5, col=gender))
data(anthropometry)
with(anthropometry, plot(age, foot_length, pch=16, cex=0.5, col=gender))

Fuel efficiency, weight, acceleration, and other measurements on 398 cars. The majority of the data come from American cars (n = 249), and some European (n = 70) and Japanese (n = 79). Not to be confused with cars data provided by base R, see cars and mtcars.

Usage

automobiles
automobiles

Format

A data frame with 398 rows and 9 variables:

car_name: character Make and model
origin: factor 'American', 'European' or 'Japanese'
build_year: double Year car was built
fuel_efficiency: double Liters / 100km
cylinders: integer Nr. of cylinders
engine_volume: double Engine volume ('displacement') in liters.
horsepower: integer Engine power (hp)
weight: double Car weight in kg
acceleration: double Time to accelerate to 60mph

Source

Data originally hosted on <http://lib.stat.cmu.edu/datasets/>, also used in ISLR (as the 'Auto' dataset). Converted to metric units for use in this package.

Berkeley admissions data, 1973

Description

A well-known example dataset, used as an excellent example for Simpson's Paradox. The Wikipedia page (see source), describes: "The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance. But when examining the individual departments, it appeared that six out of 85 departments were significantly biased against men, whereas only four were significantly biased against women. In fact, the pooled and corrected data showed a 'small but statistically significant bias in favor of women.'"

Usage

berkeley
berkeley

Format

A data frame with 6 rows and 5 variables:

Department: integer University Department, A-F
Admitted_Male: integer Nr. Admitted male applicants
Denied_Male: integer Nr. Denied male applicants
Admitted_Female: integer Nr. Addmitted female applicants
Denied_Female: integer Nr. Denied female applicants.

Source

<https://en.wikipedia.org/wiki/Simpson

A Baboon Named Brunhilda

Description

The observed responses are Geiger counter counts (times 10-4) used to measure the amount of radioactively tagged sulfate drug in the blood of a baboon named Brunhilda after an injection of the drug.

Usage

brunhild
brunhild

Format

A data frame with 21 rows and 2 variables:

Hours: integer Hours after drug injection
Sulfate: double Tagged sulfate concentration in blood

Source

<http://www.statsci.org/data/general/brunhild.html>

Cavitation resistance for Callitris branches

Description

Measurements of so-called 'percent loss conductivity' (PLC) curves on terminal twigs of Callitris trees (a member of the Cupressaceae in Australia). Twigs are subjected to increasingly negative xylem pressure (Psi, included as a positive pressure in MPa), and the loss in conductivity (i.e. the conductivity of water transport in the xylem) is measured.

Usage

callitrishydraulic
callitrishydraulic

Format

A data frame with 31 rows and 3 variables:

Rep: integer Replicate - four branches are included.
Psi: double Positive-valued negative xylem water pressure (MPa)
PLC: double Percent loss conductivity (sometimes < 0)

Examples

data(callitrishydraulic)
with(callitrishydraulic, plot(Psi, PLC, pch=Rep))

data(callitrishydraulic)
with(callitrishydraulic, plot(Psi, PLC, pch=Rep))

Cereal nutrition data - small subset nr1

Description

Small subset nr1 of the Cereals data to practice merging, see cereals (available are cereal1, cereal2 and cereal3).

Usage

cereal1
cereal1

Format

An object of class data.frame with 10 rows and 2 columns.

Cereal nutrition data - small subset nr2

Description

Small subset nr1 of the Cereals data to practice merging, see cereals (available are cereal1, cereal2 and cereal3).

Usage

cereal2
cereal2

Format

An object of class data.frame with 8 rows and 2 columns.

Cereal nutrition data - small subset nr3

Description

Small subset nr1 of the Cereals data to practice merging, see cereals (available are cereal1, cereal2 and cereal3).

Usage

cereal3
cereal3

Format

An object of class data.frame with 6 rows and 2 columns.

Cereal nutrition data

Description

This dataset summarizes 77 different brands of breakfast cereals, including calories, proteins, fats, and so on, and gives a 'rating' that indicates the overall nutritional value of the cereal.

Usage

cereals
cereals

Format

A data frame with 77 rows and 13 variables:

Cereal.name: character Cereal name
Manufacturer: factor Cereal manufacturer (letter code)
Cold.or.Hot: factor 'C' or 'H'
calories: integer
protein: integer
fat: integer
sodium: integer
fiber: double
carbo: double
sugars: integer
potass: integer
vitamins: integer
rating: double Health rating of the cereal (unknown calculation method).

Source

<https://dasl.datadescription.com/datafile/cereals/> (Originally at Statlib CMU).

Choat's Plant Drought Tolerance

Description

Data include a measure of plant drought tolerance (P50, more negative values indicate plant stems can tolerate lower water contents), and mean annual precipitation of the location where the sample was taken. Data are for 115 individual species (species name not included). Data are from original source were simplified for the purpose of this book.

Usage

choat_precipp50
choat_precipp50

Format

A data frame with 115 rows and 2 variables:

annualprecip: integer Annual rainfall (mm) where the plant was sampled.
P50: double The negative water pressure in the xylem at which 50% of stem conductivity is lost. More negative indicates higher tolerance to drought.

Source

Choat B. et al., 2012, Global convergence in the vulnerability of forests to drought, Nature 491, pages 752–755 <https://www.nature.com/articles/nature11688>.

Coweeta tree data

Description

Tree measurements in the Coweeta LTER.

Usage

coweeta
coweeta

Format

A data frame with 87 rows and 9 variables:

species: integer One of 10 tree species
site: integer Site abbreviation
elev: integer Elevation (m asl)
age: integer Tree age (yr)
DBH: double Diameter at breast height (cm)
height: double Tree height (m)
folmass: double Foliage mass (kg)
SLA: double Specific leaf area (index of leaf thinness) (cm2 g-1)
biomass: double Total tree biomass

Details

DETAILS

Source

Martin J.G., et al., 1998, Aboveground biomass and nitrogen allocation of ten deciduous southern Appalachian tree species, Canadian Journal of Forest Research 28, 1648-1659.

Dutch election data

Description

Polls for the 12 leading political parties in the Netherlands, leading up to the general election on 12 Sept. 2012. Data are in 'wide' format, with a column for each party. Values are in percentages.

Usage

dutchelection
dutchelection

Format

A data frame with 22 rows and 12 variables:

Date: factor Date of poll (NOTE: has not been converted to Date class)
VVD: double Vote for this part in percentage.
PvdA: double Vote for this part in percentage.
PVV: double Vote for this part in percentage.
CDA: double Vote for this part in percentage.
SP: double Vote for this part in percentage.
D66: double Vote for this part in percentage.
GL: double Vote for this part in percentage.
CU: double Vote for this part in percentage.
SGP: double Vote for this part in percentage.
PvdD: double Vote for this part in percentage.
FiftyPlus: double Vote for this part in percentage.

Source

<http://en.wikipedia.org/wiki/Dutch_general_election,_2012>

Leaf gas exchange at the EucFACE

Description

Measurements of leaf net photosynthesis at the EucFACE experiment, on leaves of different trees growing in ambient and elevated CO$_2$ concentrations. Measurements were repeated four times during 2013 (labelled as Date=A,B,C,D).

Usage

eucface_gasexchange
eucface_gasexchange

Format

A data frame with 84 rows and 7 variables:

Date: factor Date label (A-D)
CO2: integer CO2 treatment, Amb=ambient, Ele=elevated
Ring: integer One of six plots ('rings') where treatment was applied
Tree: integer Tree number
Photo: double Rate of leaf photosynthesis (mu mol m-2 s-1)
Trmmol: double Rate of leaf transpiration (mmol m-2 s-1)
VpdL: double Vapour pressure deficit (kPa)

Source

Gimeno T.E., 2015, Conserved stomatal behaviour under elevated CO2 and varying water availability in a mature woodland. Functional Ecology <https://doi.org/10.1111/1365-2435.12532>

EucFACE ground cover data

Description

This file contains estimates of plant and litter cover within the rings of the EucFACE experiment, evaluating forest ecosystem responses to elevated CO$_2$, on two dates. Within each ring are four plots and within each plot are four 1m by 1m subplots. Values represent counts along a grid of 16 points within each subplot.

Usage

eucfacegc
eucfacegc

Format

A data frame with 192 rows and 8 variables:

Date: integer Date of measurement (d/m/y, not yet converted to Date class)
Ring: integer The identity of the EucFACE Ring, the level at which the experimental treatment is applied.
Plot: integer A total of four plots, nested within each level of Ring.
Sub: integer A total of four subplots, nested within each level of Plot.
Forbes: integer Number of points where dicot plants are observed.
Grass: integer Number of points where grass is observed.
Litter: integer Number of points where leaf litter is observed.
Trt: integer The experimental treatment: ctrl for ambient levels of atmospheric carbon dioxide, elev for ambient plus 150ppm.

Source

Jeff Powell

Fluxtower data

Description

This dataset contains measurements of CO$_2$ and H$_2$O fluxes (and related variables) over a pine forest in Quintos de Mora, Spain. The site is a mixture of Pinus pinaster and Pinus pinea, and was planted in the 1960's.

Data need to be cleaned to some extent (the purpose of this example dataset).

Usage

fluxtower
fluxtower

Format

A data frame with 244 rows and 8 variables:

TIMESTAMP: factor Date and time
FCO2: double Canopy CO2 flux (mu mol m$^-2$ s$^-1$)
FH2O: double Canopy H2O flux (mmol m$^-2$ s$^-1$)
ustar: double Roughness length (m s$^-1$)
Tair: double Air temperature (degrees C)
RH: double Relative humidity (%)
Tsoil: double Soil temperature (degrees C)
Rain: integer Rainfall (mm half hour$^-1$)

Source

Data kindly provided by Victor Resco de Dios (in 2011), and simplified somewhat.

Seed germination as affected by fire

Description

Two datasets on the germination success of seeds of four Melaleuca species, when subjected to temperature, fire cue, and dehydration treatments. Seeds were collected from a number of sites and subjected to 6 temperature treatments and fire cues (in the fire germination data), or two a range of dehydration levels (in the water germination data).

This dataset contains the fire treatment data.

Usage

germination_fire
germination_fire

Format

A data frame with 576 rows and 7 variables:

species: factor One of four Melaleuca species
temp: integer Temperature treatment (C)
fire.cues: integer Fire cue treatment (yes or no)
site: integer Coding for the site where the seed was collected
cabinet: integer ID for the cabinet where seeds were treated
germ: integer Number of germinated seeds
n: integer Number of seeds tested (20 for all rows)

Source

Data are from Hewitt et al. 2015 (Austral Ecology 40(6):661-671), shared by Charles Morris, and simplified for the purpose of this book.

Seed germination as affected by water

Description

This dataset contains the water treatment data.

Usage

germination_water
germination_water

Format

A data frame with 352 rows and 5 variables:

species: factor One of four Melaleuca species
site: integer Coding for the site where the seed was collected
water.potential: double Water potential of the seed (Mpa) after incubation (low values is drier)
germ: integer Number of germinated seeds
n: integer Number of seeds tested (25 for all rows)

Source

Data are from Hewitt et al. 2015 (Austral Ecology 40(6):661-671), shared by Charles Morris, and simplified for the purpose of this package.

Examples

data(germination_water)
with(germination_water,
  plot(jitter(water.potential), germ/n,
    pch=21, bg=terrain.colors(4)[species])
)
data(germination_water)
with(germination_water,
  plot(jitter(water.potential), germ/n,
    pch=21, bg=terrain.colors(4)[species])
)

I x F at the HFE - tree observations

Description

Heights and stem diameters of trees growing in a fertilization x irrigation experiment in Richmond, New South Wales, Australia, as part pf the Hawkesbury Forest Experiment (HFE). A total of 16 plots, each with 72 Eucalyptus saligna trees, was remeasured 17 times between 2008 and 2012. Treatments to the plots were either control (C), applied with fertilizer (F), irrigation (I), or irrigation+fertilization (IF).

This dataset contains the tree-level observations, see hfeifplotmeans for averaged data.

Usage

hfeifbytree
hfeifbytree

Format

A data frame with 9592 rows and 6 variables:

ID: integer A unique identifier for each tree.
plotnr: integer A total of sixteen plots (four treatments).
treat: integer One of four treatments (I - irrigated, F - dry fertilized, IL - Liquid fertilizer plus irrigation, C - control)
Date: factor The date of measurement (YYYY-MM-DD)
height: double Mean height for the sample trees ($m$).
diameter: double Mean diameter for the sample trees ($cm$).

Source

Data courtesy of Craig Barton and Burhan Amiji, from Western Sydney University.

Examples

# Variable sample sizes over time. On many occassions, subsamples were measured.
data(hfeifbytree)
ftable(xtabs(~Date+treat, data=hfeifbytree))
# Variable sample sizes over time. On many occassions, subsamples were measured.
data(hfeifbytree)
ftable(xtabs(~Date+treat, data=hfeifbytree))

I x F at the HFE - plot-level observations

Description

This dataset contains the plot-level means, see hfeifbytree for tree-level measurements.

Usage

hfeifplotmeans
hfeifplotmeans

Format

A data frame with 320 rows and 5 variables:

plotnr: integer A total of sixteen plots (four treatments).
Date: factor The date of measurement (YYYY-MM-DD)
diameter: double Mean diameter for the sample trees ($cm$).
height: double Mean height for the sample trees ($m$).
treat: integer One of four treatments (I - irrigated, F - dry fertilized, IL - Liquid fertilizer plus irrigation, C - control)

Weather data at the Hawkesbury Forest Experiment

Description

Data for the weather station at the Hawkesbury Forest Experiment (HFE) for the year 2008. The HFE is in Richmond, New South Wales (in western Sydney), Australia.

Data are in 30min timestep.

Usage

hfemet2008
hfemet2008

Format

A data frame with 17568 rows and 9 variables:

DateTime: integer Date Time (half-hourly steps)
Tair: double Air temperature (degrees C)
AirPress: double Air pressure (kPa)
RH: double Relative humidity (%)
VPD: double Vapour pressure deficit (kPa)
PAR: double Photosynthetically active radiation (mu mol m$^-2$ s$^-1$)
Rain: double Precipitation (mm)
wind: double Wind speed (m s$^-1$)
winddirection: double Wind direction (degrees)

Source

Data courtesy of Craig Barton at Western Sydney University.

Howell height, age and weight data

Description

These data were also used by McElreath (2016, "Statistical Rethinking", CRC Press). Data include measurements of height, age and weight on Khosan people.

Usage

howell
howell

Format

A data frame with 783 rows and 4 variables:

sex: factor male or female
age: double Age (years)
weight: double Body weight (kg)
height: double Total height (cm)

Source

<https://tspace.library.utoronto.ca/handle/1807/17996>, subsetted for non-missing data and one outlier removed.

Examples

data(howell)
with(howell, plot(age, height, pch=19, col=sex))
data(howell)
with(howell, plot(age, height, pch=19, col=sex))

Hydro dam storage data

Description

This dataset describes the storage of the hydrodam on the Derwent river in Tasmania (Lake King William \& Lake St. Clair), in equivalent of energy stored.

Usage

hydro
hydro

Format

A data frame with 314 rows and 2 variables:

Date: factor The date of the bi-weekly reading (d/m/yyyy)
storage: integer Total water stored, in energy equivalent ($GWh$).

Details

DETAILS

Icecream sales and temperature

Description

A synthetic dataset on weekly ice cream sales in two locations in Amsterdam, along with air temperature. The idea is that the ice cream salesman first sold icecream in 'Oosterpark', and decided to move shop to the 'Dappermarkt' the year after. Did sales improve? This dataset can be used to show that naive conclusions from simple linear model fits can be misleading, and that the use of covariates (here, air temperature) can change conclusions about effects.

Usage

icecream
icecream

Format

A data frame with 40 rows and 3 variables:

temperature: double Air temperature (C)
sales: double Icecream sales per week (in local currency)
location: factor Either 'Dappermarkt' or 'Oosterpark'

Examples

data(icecream)

# Linear model, temperature as covariate
fit_ice <- lm(sales ~ temperature*location, data=icecream)

# Try to guess from coefficients where the sales were higher:
summary(fit_ice)

# What about now?
with(icecream, plot(temperature, sales, pch=19, col=location))
legend("topleft", levels(icecream$location), fill=palette())
data(icecream)

# Linear model, temperature as covariate
fit_ice <- lm(sales ~ temperature*location, data=icecream)

# Try to guess from coefficients where the sales were higher:
summary(fit_ice)

# What about now?
with(icecream, plot(temperature, sales, pch=19, col=location))
legend("topleft", levels(icecream$location), fill=palette())

Genetically modified soybean litter decomposition

Description

Soybean litter decomposition as a function of time (date), type of litter (variety), herbicides applied (herbicide), and where in the soil profile it is placed (profile). masslost refers to the proportion of the litter that was lost from the bag (decomposed) relative to the start of the experiment. Herbicide treatments were applied at the level of whole plots, with both treatments represented within each of four blocks. Both levels of variety and profile were each represented within each plot, with six replicates of each treatment added to each plot.

Usage

masslost
masslost

Format

A data frame with 246 rows and 8 variables:

plot: integer A total of eight plots.
block: integer A total of four blocks.
variety: integer Soybean variety is genetically modified ('gm') or not ('nongm'); manipulated at the subplot level.
herbicide: integer Herbicide applied is glyphosate ('gly') or conventional program ('conv'); manipulated at plot level.
profile: integer Whether litter was 'buried' in the soil or placed at the soil 'surface'; manipulated at the subplot level.
date: integer Date at which litter bags were recovered.
sample: integer Factor representing timing of sampling ('incrop1', 'incrop2', 'postharvest').
masslost: double The proportion of the initial mass that was lost from each litter bag during field incubation. Some values are lower than zero due to insufficient washing of dirt and biota from litter prior to weighing.

Source

Jeff Powell

Memory of words dataset

Description

A dataset on the number of words remembered from list, for various learning techniques, and in two age groups.

Usage

memory
memory

Format

A data frame with 100 rows and 3 variables:

Age: integer Age of person tested (yr)
Process: factor One of five methods used to memorize the words.
Words: double Number of words recalled.

Details

Description taken from source: "Why do older people often seem not to remember things as well as younger people? Do they not pay attention? Do they just not process the material as thoroughly? One theory regarding memory is that verbal material is remembered as a function of the degree to which is was processed when it was initially presented. Eysenck (1974) randomly assigned 50 younger subjects and 50 older (between 55 and 65 years old) to one of five learning groups. The Counting group was asked to read through a list of words and count the number of letters in each word. This involved the lowest level of processing. The Rhyming group was asked to read each word and think of a word that rhymed with it. The Adjective group was asked to give an adjective that could reasonably be used to modify each word in the list. The Imagery group was instructed to form vivid images of each word, and this was assumed to require the deepest level of processing. None of these four groups was told they would later be asked to recall the items. Finally, the Intentional group was asked to memorize the words for later recall. After the subjects had gone through the list of 27 items three times they were asked to write down all the words they could remember."

Source

<http://www.statsci.org/data/general/eysenck.html>.

Crude oil production

Description

Crude oil production for the top 8 oil-producing countries (minus Russia, for which understandably no data were available pre-1990), for the period 1971-2017.

Usage

oil
oil

Format

A data frame with 376 rows and 3 variables:

country: factor Country code
year: integer 1971 - 2017
production: double Annual crude oil production in TOE.

Pulse Rates before and after Exercise

Description

Pulse rates measured on 110 participating students. Half of the students ran in place for one minute, before their pulse rate was measured again.

Usage

pulse
pulse

Format

A data frame with 110 rows and 11 variables:

Height: integer Height (cm)
Weight: double Weight (kg)
Age: integer Age (years)
Gender: integer Sex (1 = male, 2 = female)
Smokes: integer Regular smoker? (1 = yes, 2 = no)
Alcohol: integer Regular drinker? (1 = yes, 2 = no)
Exercise: integer Frequency of exercise (1 = high, 2 = moderate, 3 = low)
Ran: integer Whether the student ran or sat between the first and second pulse measurements (1 = ran, 2 = sat)
Pulse1: integer First pulse measurement (rate per minute)
Pulse2: integer Second pulse measurement (rate per minute)
Year: integer Year of class (93 - 98)

Details

Description taken from source: "Students in an introductory statistics class (MS212 taught by Professor John Eccleston and Dr Richard Wilson at The University of Queensland) participated in a simple experiment. The students took their own pulse rate. They were then asked to flip a coin. If the coin came up heads, they were to run in place for one minute. Otherwise they sat for one minute. Then everyone took their pulse again. The pulse rates and other physiological and lifestyle data are given in the data. Five class groups between 1993 and 1998 participated in the experiment. The lecturer, Richard Wilson, was concerned that some students would choose the less strenuous option of sitting rather than running even if their coin came up heads, so in the years 1995-1998 a different method of random assignment was used. In these years, data forms were handed out to the class before the experiment. The forms were pre-assigned to either running or non-running and there were an equal number of each. In 1995 and 1998 not all of the forms were returned so the numbers running and sitting was still not entirely controlled."

Source

<http://www.statsci.org/data/oz/ms212.html>

Examples

data(pulse)
with(pulse, plot(Weight, Pulse2-Pulse1,
  pch=19, col=c("red2", "dimgrey")[Ran]))
abline(h=0, lty=5)
data(pulse)
with(pulse, plot(Weight, Pulse2-Pulse1,
  pch=19, col=c("red2", "dimgrey")[Ran]))
abline(h=0, lty=5)

Pupae data

Description

This dataset is from an experiment where larvae were left to feed on Eucalyptus leaves, in a glasshouse that was controlled at two different levels of temperature and CO$_2$ concentration. After the larvae pupated (that is, turned into pupae), the body weight was measured, as well as the cumulative 'frass' (larvae excrement) over the entire time it took to pupate.

Usage

pupae
pupae

Format

A data frame with 84 rows and 5 variables:

T_treatment: integer Temperature treatments ('ambient' and 'elevated')
CO2_treatment: integer CO$_2$ treatment (280 or 400 ppm).
Gender: integer The gender of the pupae : 0 (male), 1 (female)
PupalWeight: double Weight of the pupae ($g$)
Frass: double Frass produced ($g$)

Source

Data courtesy of Tara Murray, and simplified for the purpose of this package.

Rain data

Description

This dataset contains ten years (1995-2006) of daily rainfall amounts as measured at the Richmond RAAF base.

Usage

rain
rain

Format

A data frame with 3653 rows and 3 variables:

Year: integer Year
DOY: integer Day of year (1-366)
Rain: double Daily rainfall amount (mm)

Source

<http://www.bom.gov.au/climate/data/>, simplified and adjusted for this package.

Sydney to Hobart winning times

Description

Winning times for the Sydney to Hobart Yacht Race. An annual sail yacht race over 1170km, from Sydney's harbour, to Hobart in Tasmania. The race is infamous for the rough conditions, long distance, and large number of dropouts in some years. The data include the winning time, and the number of starting yachts, and the number of yachts reaching the finish.

Usage

sydney_hobart_times
sydney_hobart_times

Format

A data frame with 72 rows and 5 variables:

Year: integer Year race was held
Time: double Total time (days)
fleet_start: integer Number yachts at start
fleet_finish: integer Number yachts at finish
Time_record: double Record race up to this year

Source

<https://en.wikipedia.org/wiki/Sydney_to_Hobart_Yacht_Race>

Examples

data(sydney_hobart_times)
with(sydney_hobart_times, {
    plot(Year, Time)
    lines(Year, Time_record, type='s', col="red")
})

data(sydney_hobart_times)
with(sydney_hobart_times, {
    plot(Year, Time)
    lines(Year, Time_record, type='s', col="red")
})

Passengers on the Titanic

Description

Survival status of passengers on the Titanic, together with their names, age, sex and passenger class. Not to be confused with the dataset Titanic, provided with R, which lists only tables of passengers. This dataset on the other hand provides one row per passenger.

Usage

titanic
titanic

Format

A data frame with 1313 rows and 5 variables:

Name: integer Recorded name of passenger
PClass: integer Passenger class: 1st, 2nd or 3rd
Age: double Age in years (many missing)
Sex: integer male or female
Survived: integer 1 = Yes, 0 = No

Details

DETAILS

Source

<http://www.statsci.org/data/general/titanic.html>

Tree canopy gradients in the Priest River Experimental Forest (PREF)

Description

Leaves of two pine species (35 trees in total) were sampled throughout their canopy, usually 8 samples were taken at various heights. The height is expressed as the 'distance from top', i.e. the distance to the apex of the tree. Leaves (conifer needles) were analysed for nitrogen content (narea), and an index of leaf thickness, the 'leaf mass per area'. The data show the usual pattern of higher leaf thickness (higher LMA) toward the top of the trees, but individual trees show a lot of variation in LMA.

Usage

treecanopy
treecanopy

Format

A data frame with 249 rows and 7 variables:

ID: integer ID of the individual tree
species: integer Pinus ponderosa or Pinus monticola
dfromtop: double Distance from top of tree (where leaf sample was taken) (m)
totheight: double Total height of the tree (m)
height: double Height from the ground (where sample was taken) (m)
LMA: double Leaf mass per area (g m$^-2$)
narea: double Nitrogen per area (gN m$^-2$)

Source

Marshall, J.D., Monserud, R.A. 2003. Foliage height influences specific leaf area of three conifer species. Can J For Res 33:164-170

Examples

data(treecanopy)
if(require(ggplot2)){
 ggplot(treecanopy, aes(dfromtop,LMA,group=ID,col=species)) +
   geom_point() +
   stat_smooth(method="lm",se=FALSE) +
   theme_minimal()
}
data(treecanopy)
if(require(ggplot2)){
 ggplot(treecanopy, aes(dfromtop,LMA,group=ID,col=species)) +
   geom_point() +
   stat_smooth(method="lm",se=FALSE) +
   theme_minimal()
}

Xylem vessel diameters

Description

Measurements of diameters of xylem (wood) vessels on a single Eucalyptus saligna tree grown at the Hawkesbury Forest Experiment.

Usage

vessel
vessel

Format

A data frame with 550 rows and 3 variables:

position: integer Either 'base' or 'apex' : the tree was sampled at stem base and near the top of the tree.
imagenr: integer At the stem base, six images were analyzed (and all vessels measured in that image). At apex, three images.
vesseldiam: double Diameter of individual water-conducting vessels (mu m).

Source

Sebastian Pfautsch

Weight loss data

Description

This dataset contains measurements of a Jeremy Zawodny over a period of about 3 months while he was trying to lose weight. This is an example of an irregular timeseries dataset (intervals between measurements vary).

Usage

weightloss
weightloss

Format

A data frame with 67 rows and 2 variables:

Date: factor Date, d/m/yy
Weight: double Weight, in pounds

Source

<http://jeremy.zawodny.com/blog/archives/006851.html>

Mouse metabolism

Description

Wild mice were placed in a device where the metabolic rate (energy used by the animal) can be measured directly, and continuously. Measurements were made at varying temperature (15, 20 and 31C), mice were provided with food or not, and were able to exercise (with a treadmill) or not.

Usage

wildmousemetabolism
wildmousemetabolism

Format

A data frame with 864 rows and 9 variables:

id: integer Individual number
run: integer The experiment was repeated three times (run = 1,2,3)
day: integer Day of experiment (1-6)
temp: integer Temperature (deg C)
food: integer Whether food was provided ('Yes') or not ('No')
bm: double Body mass (g)
wheel: integer Whether the mouse could use an exercise wheel ('Yes') or not ('No')
rmr: double Resting metabolic rate (minimum rate of a running average over 12min) (kC hour-1)
sex: integer Male or Female

Source

Christopher Turbill

Package 'lgrdata'

Help Index

Allometry

Description

Usage

Format

Examples

Child anthropometry

Description

Usage

Format

Source

Examples

Cars data

Description

Usage

Format

Source

Berkeley admissions data, 1973

Description

Usage

Format

Source

A Baboon Named Brunhilda

Description

Usage

Format

Source

Cavitation resistance for Callitris branches

Description

Usage

Format

Examples

Cereal nutrition data - small subset nr1

Description

Usage

Format

Cereal nutrition data - small subset nr2

Description

Usage

Format

Cereal nutrition data - small subset nr3

Description

Usage

Format

Cereal nutrition data

Description

Usage

Format

Source

Choat's Plant Drought Tolerance

Description

Usage

Format

Source

Coweeta tree data

Description

Usage

Format

Details

Source

Dutch election data

Description

Usage

Format

Source

Leaf gas exchange at the EucFACE

Description

Usage

Format

Source

EucFACE ground cover data

Description

Usage

Format

Source

Fluxtower data

Description

Usage

Format