| Title: | Summary Table and Means Plots |
|---|---|
| Description: | Optimized for handling complex datasets in environmental and ecological research, this package offers functionality that is not fully met by general-purpose packages. It provides two key functions, 'summarize_data()', which summarizes datasets, and 'plot_means()', which creates plots with error bars. The 'plot_means()' function incorporates error bars by default, allowing quick visualization of uncertainties, crucial in ecological studies. It also streamlines workflows for grouped datasets (e.g., by species or treatment), making it particularly user-friendly and reducing the complexity and time required for data summarization and visualization. |
| Authors: | Oswald Omuron [aut, cre] |
| Maintainer: | Oswald Omuron <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-26 07:23:22 UTC |
| Source: | https://github.com/oswaldomuron/kifidi |
The Kifidi package provides tools for summarizing and visualizing grouped numerical data, especially for environmental and ecological datasets. It includes functions for generating statistical summaries, plotting means with error bars, performing grouped regression analysis, and generating frequency counts by group.
summarize_dataProvides statistical summaries (mean, SD, N, etc.) of numeric data grouped by one or two categorical variables.
plot_meansCreates bar plots of means with optional error bars.
countsGenerates frequency tables or counts of observations by grouping variables.
plot_group_regressionsPerforms and plots linear regressions grouped by a factor variable.
plot_lmm_regressionsPlots group-level and fixed-effect regression lines from a linear mixed-effects modelin lme4 package with lmer().
plot_lme_regressionsPlots group-level lines from a linear mixed-effects model in nlme with lme().
generate_random_pointsGenerates random (x, y) sampling coordinates within a rectangular plot area and optionally exports them as CSV.
Oswald Omuron
Computes the Pearson correlation matrix for a set of numeric variables and calculates Variance Inflation Factors (VIFs) to assess multicollinearity. All variables are included in the VIF calculation using a dummy response variable in an additive linear model.
cor_vif_table(data, vars)cor_vif_table(data, vars)
data |
A data frame containing the variables of interest. |
vars |
A character vector specifying the names of the numeric variables to include.
All specified variables must exist in |
The correlation matrix shows pairwise linear associations between variables.
VIFs are computed using a linear model with all variables as predictors and a dummy response.
The VIF calculation assumes an additive linear model: each variable is included as a main effect only, and no interaction terms or higher-order terms are included.
The function automatically removes rows with missing values (NA) in the selected variables.
VIFs reflect multicollinearity of each variable with respect to all other variables in the set.
A list with two elements:
A numeric matrix of pairwise Pearson correlations among the selected variables.
A data frame with columns Variable and GVIF, giving the variance inflation factor for each variable.
# Create example data frame set.seed(123) Z <- data.frame( L.AREA = rnorm(20, mean = 50, sd = 10), L.DIST = rnorm(20, mean = 30, sd = 5), L.LDIST = rnorm(20, mean = 15, sd = 3), YR.ISOL = rnorm(20, mean = 10, sd = 2), ALT = rnorm(20, mean = 100, sd = 20), GRAZE = rnorm(20, mean = 5, sd = 1) ) # Select variables to analyze vars <- c("L.AREA", "L.DIST", "L.LDIST", "YR.ISOL", "ALT", "GRAZE") # Run the correlation and VIF function result <- cor_vif_table(Z, vars) # View the correlation matrix result$correlations # View the variance inflation factors (VIFs) result$VIF# Create example data frame set.seed(123) Z <- data.frame( L.AREA = rnorm(20, mean = 50, sd = 10), L.DIST = rnorm(20, mean = 30, sd = 5), L.LDIST = rnorm(20, mean = 15, sd = 3), YR.ISOL = rnorm(20, mean = 10, sd = 2), ALT = rnorm(20, mean = 100, sd = 20), GRAZE = rnorm(20, mean = 5, sd = 1) ) # Select variables to analyze vars <- c("L.AREA", "L.DIST", "L.LDIST", "YR.ISOL", "ALT", "GRAZE") # Run the correlation and VIF function result <- cor_vif_table(Z, vars) # View the correlation matrix result$correlations # View the variance inflation factors (VIFs) result$VIF
This function calculates the frequency of each unique value in a given column of data, excluding NA values.
counts(column_data)counts(column_data)
column_data |
A vector of data (numeric, character, or factor) from which the unique groups and their frequencies are calculated. |
The function first removes any NA values from the input data and identifies the unique groups.
It then counts the occurrences of each unique value using a loop and returns the results as a data frame with two columns:
group (the unique values) and counts (their respective frequencies).
A data frame with:
The unique values from the input data.
The frequency of each unique value.
This implementation uses a loop and may be slower for very large datasets. For faster performance, consider using table() or dplyr::count().
Oswald Omuron
data <- c("A", "B", "A", "C", "B", "B", NA, "A", "C") result <- counts(data) print(result)data <- c("A", "B", "A", "C", "B", "B", NA, "A", "C") result <- counts(data) print(result)
This function generates random (x, y) coordinates within a rectangular plot area for biomass or other sampling. It can plot the points and optionally export them as CSV.
generate_random_points( plot_length = 3, plot_width = 3, n_points = 5, seed = NULL, export_csv = FALSE, filename = "random_coordinates.csv" )generate_random_points( plot_length = 3, plot_width = 3, n_points = 5, seed = NULL, export_csv = FALSE, filename = "random_coordinates.csv" )
plot_length |
Numeric. Length of the plot in meters. Default is 3. |
plot_width |
Numeric. Width of the plot in meters. Default is 3. |
n_points |
Integer. Number of random points to generate. Default is 5. |
seed |
Integer or NULL. Seed for random number generator to reproduce results. Default is NULL (random every run). |
export_csv |
Logical. Whether to export the coordinates as a CSV file. Default is FALSE. |
filename |
Character. Name of the CSV file to export if |
A data.frame with columns: Point, X_meters, Y_meters.
# Generate 5 random points in a 3x3 m plot, plot and export csv generate_random_points(plot_length = 3, plot_width = 3, n_points = 5, seed = 42, export_csv = TRUE, filename = "points.csv") # Generate random points without fixed seed (different each run) generate_random_points(n_points = 10)# Generate 5 random points in a 3x3 m plot, plot and export csv generate_random_points(plot_length = 3, plot_width = 3, n_points = 5, seed = 42, export_csv = TRUE, filename = "points.csv") # Generate random points without fixed seed (different each run) generate_random_points(n_points = 10)
This function plots x vs y and fits linear models, either by group or for all data.
plot_group_regressions( x, y, group = NULL, colors = NULL, main = NULL, xlab = NULL, ylab = NULL, legend = TRUE, legend_position = "topright", return_models = FALSE, conf.int = FALSE, label_equations = FALSE, draw_lm = TRUE, add = FALSE, theme = "default", lty = 1, lwd = 2, pch = 16, ... )plot_group_regressions( x, y, group = NULL, colors = NULL, main = NULL, xlab = NULL, ylab = NULL, legend = TRUE, legend_position = "topright", return_models = FALSE, conf.int = FALSE, label_equations = FALSE, draw_lm = TRUE, add = FALSE, theme = "default", lty = 1, lwd = 2, pch = 16, ... )
x |
A numeric vector for the x-axis. |
y |
A numeric vector for the y-axis. |
group |
Optional factor for grouping. If |
colors |
Named vector of colors for groups or a vector matching number of groups. |
main |
Main title of the plot. |
xlab |
Label for x-axis. |
ylab |
Label for y-axis. |
legend |
Logical; whether to show the legend. |
legend_position |
Position of the legend (e.g., "topright"). |
return_models |
Logical; return list of lm models. |
conf.int |
Logical; whether to draw confidence intervals. |
label_equations |
Logical; whether to label each group with its regression equation. |
draw_lm |
Logical; whether to draw the regression line(s). |
add |
Logical; whether to add to an existing plot. |
theme |
Plot theme (currently unused). |
lty |
Line type(s) for regression line. |
lwd |
Line width(s) for regression line. |
pch |
Plotting character(s) for points. |
... |
Additional plotting parameters passed to |
Optionally returns a list of lm models if return_models = TRUE.
Fits a linear mixed-effects model using nlme and plots the observed data and regression lines for each group, including fixed and random effects. Optionally plots the overall fixed effect regression line and displays model statistics (R² values and AIC).
plot_lme_regressions( model_or_formula, random = NULL, data = NULL, legend = TRUE, legend_position = "right", inset = 0, return_model = FALSE, lty = NULL, pch = 16, lwd = 2, axes = TRUE, ann = TRUE, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, oma = c(0, 0, 0, 0), mar = c(5, 4, 4, 2), draw_fixed_effects = FALSE, fixed_col = "black", fixed_lty = 2, fixed_lwd = 3, fixed_confi = FALSE, ... )plot_lme_regressions( model_or_formula, random = NULL, data = NULL, legend = TRUE, legend_position = "right", inset = 0, return_model = FALSE, lty = NULL, pch = 16, lwd = 2, axes = TRUE, ann = TRUE, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, oma = c(0, 0, 0, 0), mar = c(5, 4, 4, 2), draw_fixed_effects = FALSE, fixed_col = "black", fixed_lty = 2, fixed_lwd = 3, fixed_confi = FALSE, ... )
model_or_formula |
Either a fitted |
random |
A random effects formula, e.g. |
data |
A data frame containing the variables in the model. Required only if a formula (not a model) is supplied. |
legend |
Logical, whether to display a legend (default = TRUE). |
legend_position |
Position of the legend ("right", "topright", etc.). |
inset |
Inset for the legend. |
return_model |
Logical, if TRUE returns the fitted model (default = FALSE). |
lty |
Line type for group-specific regression lines. |
pch |
Plotting character for data points. |
lwd |
Line width for group-specific regression lines (default = 2). |
axes |
Logical, whether to draw axes (default = TRUE). |
ann |
Logical, whether to include plot annotations (default = TRUE). |
xlim, ylim
|
Axis limits for the plot. |
main |
Plot title. |
xlab, ylab
|
Axis labels. |
col |
Colors for groups. Defaults to distinct colors for each group. |
oma |
Outer margin areas. |
mar |
Margins of the plot. |
draw_fixed_effects |
Logical, whether to plot the fixed-effect regression line (default = FALSE). |
fixed_col |
Color of the fixed-effect regression line (default = "black"). |
fixed_lty |
Line type of the fixed-effect regression line (default = 2). |
fixed_lwd |
Line width of the fixed-effect regression line (default = 3). |
... |
Additional arguments passed to |
The function automatically computes and plots regression lines for each
grouping level based on both fixed and random effects. If draw_fixed_effects = TRUE,
the overall fixed-effect regression line is drawn across the full x-range.
The plot legend includes regression equations for each group and, optionally,
the fixed effect line. Model performance metrics, including marginal R² (R²m),
conditional R² (R²c), and AIC, are displayed in the legend panel.
Invisibly returns NULL unless return_model = TRUE, in which case
it returns the fitted nlme::lme model object.
## Not run: library(nlme) data(Orthodont) plot_lme_regressions(distance ~ age, random = ~ age | Subject, data = Orthodont, draw_fixed_effects = TRUE, fixed_col = "red") ## End(Not run)## Not run: library(nlme) data(Orthodont) plot_lme_regressions(distance ~ age, random = ~ age | Subject, data = Orthodont, draw_fixed_effects = TRUE, fixed_col = "red") ## End(Not run)
This function fits a linear mixed-effects model using lmer() and plots group-level
regression lines and optionally the fixed effect regression line. It includes group-specific
points and regression lines, and can display model statistics such as Nakagawa R² values and AIC.
plot_lmm_regressions( formula, data, colors = NULL, lty = 1, lwd = 2, pch = 16, xlab = NULL, ylab = NULL, main = NULL, draw_fixed_line = FALSE, draw_group_lines = TRUE, label_equations = FALSE, legend_position = "topright", inset = 0, xpd = TRUE, ann = TRUE, axes = TRUE, legend = TRUE, return_model = FALSE, mar = c(5, 4, 4, 15), oma = c(0, 0, 0, 4), xlim = NULL, ylim = NULL, ... )plot_lmm_regressions( formula, data, colors = NULL, lty = 1, lwd = 2, pch = 16, xlab = NULL, ylab = NULL, main = NULL, draw_fixed_line = FALSE, draw_group_lines = TRUE, label_equations = FALSE, legend_position = "topright", inset = 0, xpd = TRUE, ann = TRUE, axes = TRUE, legend = TRUE, return_model = FALSE, mar = c(5, 4, 4, 15), oma = c(0, 0, 0, 4), xlim = NULL, ylim = NULL, ... )
formula |
A formula specifying the model (e.g., |
data |
A data frame containing the variables in the model. |
colors |
A vector of colors for each group. Defaults to rainbow colors if not specified. |
lty |
Line type(s) for regression lines. Can be a single value or vector. |
lwd |
Line width(s) for regression lines. Can be a single value or vector. |
pch |
Point character(s) for data points. Can be a single value or vector. |
xlab |
Label for the x-axis. If |
ylab |
Label for the y-axis. If |
main |
Plot title. |
draw_fixed_line |
Logical; if |
draw_group_lines |
Logical; if |
label_equations |
Logical; if |
legend_position |
Position of the legend (default is |
inset |
Inset spacing for the legend; default is 0. |
xpd |
Logical; whether to allow plotting outside the plot region. Defaults to |
ann |
Logical; whether to annotate the axes (titles, labels). |
axes |
Logical; whether to draw axes. |
legend |
Logical; whether to display the legend. |
return_model |
Logical; if |
... |
Additional graphical parameters passed to |
This function plots both the individual group-level data and their corresponding regression lines from a linear mixed model. It optionally adds the fixed-effect regression line (representing population-level trend), and can annotate the plot with R² statistics (marginal and conditional) and AIC.
It uses lme4::lmer() to fit the model and MuMIn::r.squaredGLMM() to compute Nakagawa's R².
If return_model = TRUE, the fitted lmer model object is returned. Otherwise, the function returns NULL invisibly.
## Not run: library(lme4) library(MuMIn) data(sleepstudy) plot_lmm_regressions(Reaction ~ Days + (Days | Subject), data = sleepstudy, draw_fixed_line = TRUE, label_equations = TRUE, show_aic = TRUE) ## End(Not run)## Not run: library(lme4) library(MuMIn) data(sleepstudy) plot_lmm_regressions(Reaction ~ Days + (Days | Subject), data = sleepstudy, draw_fixed_line = TRUE, label_equations = TRUE, show_aic = TRUE) ## End(Not run)
Creates a bar plot of mean values from a summary data frame with optional error bars showing standard errors.
plot_means( summary_df, main_title = "Mean Values by Group", ylab = NULL, xlab = NULL, bar_color = "skyblue", error_bar_color = "red", bar_width = 0.7, error_bar_length = 0.1, axes = TRUE, space = NULL, density = NULL, angle = 45, col = NULL, names_arg = NULL, xlab_custom = NULL, ylab_custom = NULL, ann = TRUE, xlim = NULL, ylim = NULL, xaxt = "s", las = NULL )plot_means( summary_df, main_title = "Mean Values by Group", ylab = NULL, xlab = NULL, bar_color = "skyblue", error_bar_color = "red", bar_width = 0.7, error_bar_length = 0.1, axes = TRUE, space = NULL, density = NULL, angle = 45, col = NULL, names_arg = NULL, xlab_custom = NULL, ylab_custom = NULL, ann = TRUE, xlim = NULL, ylim = NULL, xaxt = "s", las = NULL )
summary_df |
A data frame containing summary statistics including means, standard errors, and group identifiers. |
main_title |
Main title for the plot. Default is "Mean Values by Group". |
ylab |
Deprecated. Label for the y-axis. |
xlab |
Deprecated. Label for the x-axis. |
bar_color |
Color for the bars. Default is "skyblue". |
error_bar_color |
Color for the error bars. Default is "red". |
bar_width |
Width of the bars. Default is 0.7. |
error_bar_length |
Length of the error bar end caps. Default is 0.1. |
axes |
Logical indicating whether axes are drawn. Default is TRUE. |
space |
Numeric or vector indicating spacing between bars. |
density |
Numeric vector for shading density lines on bars. |
angle |
Angle of shading lines on bars. |
col |
Optional colors for shading lines (overrides bar_color). |
names_arg |
Character vector specifying names for x-axis labels. Defaults to group labels in summary_df. |
xlab_custom |
Custom label for the x-axis. Defaults to "Groups". |
ylab_custom |
Custom label for the y-axis. Defaults to "Mean". |
ann |
Logical indicating whether to draw axis labels and titles. Default is TRUE. |
xlim |
Numeric vector of length 2 defining x-axis limits. |
ylim |
Numeric vector of length 2 defining y-axis limits. |
xaxt |
Character specifying x-axis type; "s" for standard, "n" for none. Default is "s". |
las |
Numeric controlling orientation of axis labels. |
If the input data frame contains two grouping variables (e.g., Group1 and Group2), these are combined with a hyphen to create the x-axis labels. The function draws a bar plot of the means with error bars representing Mean ± SE.
Invisibly returns the midpoints of the bars (as from barplot).
This function uses base R graphics and does not depend on external packages.
Oswald Omuron
See barplot and arrows in base R for details.
summary for creating summary data frames.
example_data <- c( 445, 372, 284, 247, 328, 98.8, 108.7, 100.8, 123.6, 129.9, 133.3, 130.1, 123.1, 186.6, 215, 19.4, 19.3, 27.8, 26, 22, 30.9, 19.8, 16.5, 20.2, 31, 21.1, 16.5, 19.7, 18.9, 27, 161.8, 117, 94.6, 97.5, 142.7, 109.9, 118.3, 111.4, 96.5, 109, 114.1, 114.9, 101.2, 112.7, 111.1, 194.8, 169.9, 159.1, 100.8, 130.8, 93.6, 105.7, 178.4, 203, 172.2, 127.3, 128.3, 110.9, 124.1, 179.1, 293, 197.5, 139.1, 98.1, 84.6, 81.4, 87.2, 71.1, 70.3, 120.4, 194.5, 167.5, 121, 86.5, 81.7 ) example_group1 <- c( rep("Palm", 15), rep("Papyrus", 10), rep("Typha", 15), rep("Eucalyptus", 15), rep("Rice farm", 20) ) example_group2 <- rep(c(50, 40, 30, 20, 10), 15) example_df <- data.frame( Vegetation_types = example_group1, Depth_revised = example_group2, EC_uS_cm = example_data ) summary_one_group <- summarize_data( example_df$EC_uS_cm, example_df$Vegetation_types ) summary_two_groups <- summarize_data( example_df$EC_uS_cm, example_df$Vegetation_types, example_df$Depth_revised ) plot_means( summary_two_groups, ylim = c(0, 350), las = 2, space = c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0) )example_data <- c( 445, 372, 284, 247, 328, 98.8, 108.7, 100.8, 123.6, 129.9, 133.3, 130.1, 123.1, 186.6, 215, 19.4, 19.3, 27.8, 26, 22, 30.9, 19.8, 16.5, 20.2, 31, 21.1, 16.5, 19.7, 18.9, 27, 161.8, 117, 94.6, 97.5, 142.7, 109.9, 118.3, 111.4, 96.5, 109, 114.1, 114.9, 101.2, 112.7, 111.1, 194.8, 169.9, 159.1, 100.8, 130.8, 93.6, 105.7, 178.4, 203, 172.2, 127.3, 128.3, 110.9, 124.1, 179.1, 293, 197.5, 139.1, 98.1, 84.6, 81.4, 87.2, 71.1, 70.3, 120.4, 194.5, 167.5, 121, 86.5, 81.7 ) example_group1 <- c( rep("Palm", 15), rep("Papyrus", 10), rep("Typha", 15), rep("Eucalyptus", 15), rep("Rice farm", 20) ) example_group2 <- rep(c(50, 40, 30, 20, 10), 15) example_df <- data.frame( Vegetation_types = example_group1, Depth_revised = example_group2, EC_uS_cm = example_data ) summary_one_group <- summarize_data( example_df$EC_uS_cm, example_df$Vegetation_types ) summary_two_groups <- summarize_data( example_df$EC_uS_cm, example_df$Vegetation_types, example_df$Depth_revised ) plot_means( summary_two_groups, ylim = c(0, 350), las = 2, space = c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0) )
This function summarizes a numeric vector by one or two grouping variables. It calculates mean, standard deviation, sample size, min, max, median, and standard error.
summarize_data(column_data, group_var1, group_var2 = NULL)summarize_data(column_data, group_var1, group_var2 = NULL)
column_data |
A numeric vector containing the values to summarize. |
group_var1 |
A factor or vector to group by (required). |
group_var2 |
An optional second grouping factor or vector. |
A data frame containing summary statistics by group(s):
The first grouping variable.
The second grouping variable (if provided).
Group mean.
Standard deviation.
Sample size.
Minimum value.
Maximum value.
Median value.
Standard error of the mean.
Oswald Omuron
data <- c(10, 20, 30, 40, 50, 60) group1 <- c("A", "A", "B", "B", "C", "C") group2 <- c(1, 1, 2, 2, 3, 3) summarize_data(data, group1) summarize_data(data, group1, group2)data <- c(10, 20, 30, 40, 50, 60) group1 <- c("A", "A", "B", "B", "C", "C") group2 <- c(1, 1, 2, 2, 3, 3) summarize_data(data, group1) summarize_data(data, group1, group2)