{ "cells": [ { "cell_type": "markdown", "id": "9695f80e", "metadata": {}, "source": [ "# Plot dot plot\n", "\n", "This notebook will help you generate \"Prism-style\" dot plots in Python, inspect the distribution of your data, and run two-sample statistics.\n", "
" ] }, { "cell_type": "markdown", "id": "98bd8fba", "metadata": {}, "source": [ "## Import Data\n", "For this notebook, we can either import our data from a CSV file, or by manually entering the values. \n", "\n", "If you'd like to import your data from a CSV file, you will need to follow the instructions for uploading data to Colab on [the home page](https://bipn145.github.io/intro.html). If you are using this option, comment out the lines of code under Option 2.\n", "\n", "> **Task**: \n", "> 1. Change `data_1` and `data_2` to be your two groups of data. *Make sure you leave these as lists, with brackets on each end, and each data point separated by a comma.*\n", "> 2. *Optional*: Rename `Condition_1` and `Condition_2`. *Make sure you keep these in single quotes, so Python recognizes them as a string!*" ] }, { "cell_type": "code", "execution_count": 1, "id": "b6becc60", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Condition_1Condition_2
013
134
235
323
412
526
647
728
853
\n", "
" ], "text/plain": [ " Condition_1 Condition_2\n", "0 1 3\n", "1 3 4\n", "2 3 5\n", "3 2 3\n", "4 1 2\n", "5 2 6\n", "6 4 7\n", "7 2 8\n", "8 5 3" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Option 1: Import a CSV file as a Pandas dataframe\n", "import pandas as pd\n", "#filename = ...\n", "#data = pd.read_csv(filename)\n", "\n", "# Option 2: Import your data as two lists and generate a dataframe from it\n", "data_1 = [1,3,3,2,1,2,4,2,5]\n", "data_2 = [3,4,5,3,2,6,7,8,3]\n", "data = pd.DataFrame(data={'Condition_1':data_1,'Condition_2':data_2})\n", "\n", "# Show the data\n", "data" ] }, { "cell_type": "markdown", "id": "c2c81a8d", "metadata": {}, "source": [ "## Plot Data\n", "Below, we'll use a seaborn plotting function called [swarmplot](https://seaborn.pydata.org/generated/seaborn.swarmplot.html) to plot each of our data points. \n", "\n", "### Notes \n", "* This will draw a **dotted gray** line for the mean, and a **solid black line** for the median. \n", "* Change the `plt.ylabel` line to add your own label." ] }, { "cell_type": "code", "execution_count": 1, "id": "eeb3090c", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'data' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;31m# plot the mean line\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m sns.boxplot(data=data, showmeans=True,meanline=True,\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0mmeanprops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m'color'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'gray'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'ls'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'--'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'lw'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mmedianprops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m'visible'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'color'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'black'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'ls'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'-'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'lw'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'data' is not defined" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "image/png": { "height": 252, "width": 268 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Import needed packages\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "%config InlineBackend.figure_format = 'retina'\n", "\n", "# Set up the plot\n", "fig,ax = plt.subplots(1,1,figsize=(4,4))\n", "\n", "# plot the mean line\n", "sns.boxplot(data=data, showmeans=True,meanline=True,\n", " meanprops={'color': 'gray', 'ls': '--', 'lw': 2},\n", " medianprops={'visible': True,'color': 'black', 'ls': '-', 'lw': 2},\n", " whiskerprops={'visible': False},\n", " showfliers=False,showbox=False,showcaps=False)\n", "\n", "# plot individual data points\n", "sns.swarmplot(data=data,s=8)\n", "\n", "plt.ylabel('Thing we\\'re measuring')\n", "\n", "# Make the axes look nice!\n", "ax.spines[['right', 'top']].set_visible(False)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "73046e36", "metadata": {}, "source": [ "## Check to see how skewed the data is\n", "\n", "Before we run any hypoothesis tests, we need to know if our data is skewed or not. To test for skewness, we can use [`stats.skewtest`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewtest.html#scipy.stats.skewtest) to test. This method implements the D'Agostino-Pearson skewness test, one of many different tests (e.g., the Kolmogorov-Smirov test) that can be used to check the normality of a distribution. **If the skew test gives us a p-value of less than 0.05, the population is skewed.**\n", "\n", ">**Task**: Run the cell below, but then change the `sample` to `data_2` (or create a separate cell for `data_2` to test your second group of data points." ] }, { "cell_type": "code", "execution_count": 3, "id": "68ef3272", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The skewtest p-value is 0.346991576561619385898893597187\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "image/png": { "height": 432, "width": 567 } }, "output_type": "display_data" } ], "source": [ "from scipy import stats\n", "\n", "sample = data_1 # Choose which data to use\n", "\n", "stat,pvalue = stats.skewtest(sample) # Run the skew test\n", "\n", "# Print the p value of the skew test up to 30 decimal points\n", "print('The skewtest p-value is ' + '%.30f' % pvalue) \n", "\n", "plt.hist(sample) # Create a histogram\n", "plt.ylabel('Observations')\n", "plt.xlabel('Thing we\\'re measuring')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "38e616cb", "metadata": {}, "source": [ "## Run two-sample statistics\n", "\n", "### *Inferential statistics* generalize from observed data to the world at large\n", "Most often, the goal of our hypothesis testing is to test whether or not two distributions are different, or if a distribution has a different mean than the underlying population distribution.\n", "\n", "The SciPy stats package has [many hypothesis testing tools](https://docs.scipy.org/doc/scipy/reference/stats.html). For many simple cases in biology or neuroscience research, we'd like to test whether two or more distributions are different from eachother.\n", "\n", "If we know our distributions are normal (they're generated from a normal distribution!) we can use **parametric statistics** to test our hypothesis. To test for differences between normal populations, we can use the independent t-test in our stats package: `stats.ttest_ind()`.\n", "\n", "If we had paired samples, we would use a dependent t-test [as seen here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html#scipy.stats.ttest_rel).\n", "\n", "If one of our populations is skewed, however, we **cannot use a t-test**. A t-test assumes that the populations are normally distributed. For skewed populations, we can use either the [Mann-Whitney U](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu) (for independent samples, `stats.mannwhitneyu()`) or the [Wilcoxon Signed Rank Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html#scipy.stats.wilcoxon) (for dependent/paired samples,`stats.wilcoxon()`).\n", "\n", "Below, there is sample code to run three different statistical tests. **You should use *only* the one that is most appropriate for your data by uncommenting that line.**" ] }, { "cell_type": "code", "execution_count": 4, "id": "80e78e69", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TtestResult(statistic=-2.4382276613229465, pvalue=0.026796307428331737, df=16.0)\n", "TtestResult(statistic=-2.6832815729997477, pvalue=0.027784351010990083, df=8)\n" ] } ], "source": [ "print(stats.ttest_ind(data_1,data_2)) # to run an independent t-test\n", "# print(stats.ttest_rel(data_1,data_2)) # to run an dependent t-test\n", "# print(stats.mannwhitneyu(data_1,_2)) # to run a mannwhitneyu\n", "# print(stats.wilcoxon(data_1,data_2)) # to run a wilcoxon signed rank test" ] }, { "cell_type": "markdown", "id": "2c2bb7de", "metadata": {}, "source": [ "That's it for this notebook! You can adapt this code for lots of different projects (including your final project!)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 5 }