Metadata-Version: 2.1
Name: pysumstats
Version: 0.3
Summary: Package for working with GWAS summary statistics
Home-page: https://github.com/matthijsz/pysumstats
Author: Matthijs D. van der Zee
Author-email: m.d.vander.zee@vu.nl
License: MIT
Description: # Patch notes
        
        ##### 12-05-2020 (v0.3)
         - Added `fig` and `ax` arguments to `pysumstats.plot.qqplot` and `pysumstats.plot.manhattan` to enable plotting to existing figure and axis.
         - Added `pysumstats.plot.pzplot`, to visually compare Z-values from `B/SE` to Z-values calculated from the P-value.
         - Added `pysumstats.plot.afplot`, to plot allele frequency differences between summary statistics.
         - Added `pysumstats.plot.zzplot`, to plot differences in Z-values between summary statistics.
         - Added `qqplot`, `manhattan`, `pzplot`, `afplot`, `zzplot` functions to MergedSumStats object.
         - Added `pzplot` function to SumStats object.
         - Added `plot_all` functions to SumStats and MergedSumStats objects to automatically generate all possible plots for the object.
        
        ##### 11-05-2020 (v0.2.3)
        
         - Added `return` statement to MergedSumStats.merge() when `inplace=False` and merging with other MergedSumstats.
         - Added docstrings to base, mergedsumstats, sumstats and utils.
         - Added [docs](https://pysumstats.readthedocs.io/en/latest/)
         - Fixed import errors and added `manhattan` and `qq` function to `SumStats` class
        
        ##### 08-05-2020 (v0.2)
        
         - Added `plot` subpackage with `qqplot` and `manhattan`,  from  my initial [Python-QQMan module](https://github.com/matthijsz/qqman).
        
        ##### 08-05-2020 (v0.1)
        
         - Adapted to be a package rather then a module.
         - Added `low_ram` argument to SumStats to read/write data to disk rather than RAM, in case of memory issues.  
        
        # Description
        
        A python package for working with GWAS summary statistics data in Python. <br/>
        This package is designed to make it easy to read summary statistics, perform QC, merge summary statistics and perform meta-analysis.<br/>
        Meta-analysis can be performed with `.meta()` with inverse-variance weighted or samplesize-weighted methods.<br/>
        GWAMA as described in [Baselmans, et al. (2019)](https://www.nature.com/articles/s41588-018-0320-8) can be performed using the `.gwama()` function in merged summary statistics. <br/>
        The plotting package uses matplotlib.pyplot for generating figures, so the functions are generally compatible with matplotlib.pyplot colors, and Figure and Axis objects. <br/>
        Warning: merging with low_memory enabled is still highly experimental. <br/>
        
        # Reference
        
        Using the pysumstats package for a publication, or something similar? That is **awesome**! <br/>
        There is no publication attached to this package, 
        and I am not going to force anyone to reference me or make me a co-author or whatever, I want this to remain easily accessible. 
        But I would greatly appreciate it if you add a link to this github, or a reference to it in the acknowledgements or something like that. <br/>
        If you have any questions, want to help add methods or want to let me know you are planning a publication with this, you can get in touch via the [pypi website of this project](https://pypi.org/project/pysumstats/).
        
        # Installation
        
        This package was made for Python 3.7. Clone the package directly from this github, or install with 
        
        `pip3 install pysumstats`
        
        
        # Usage
        
        `import pysumstats as sumstats`
        ###### Reading files
        `s1 = sumstats.SumStats("sumstats1.csv.gz", phenotype='GWASsummary1')`
        ###### Reading data without sample size column: you will manually have to specify gwas sample size
        `s2 = sumstats.SumStats("sumstats2.txt.gz", phenotype='GWASsummary2', gwas_n=350492)`
        ###### Reading data with column names not automatically recognized:
        ```
        s3 = sumstats.SumStats("sumstats3.csv", phenotype='GWASsummary3',
                                      column_names={
                                            'rsid': 'weird_name_for_rsid',
                                            'chr': 'weird_name_for_chr',
                                            'bp': 'weird_name_for_bp',
                                            'ea': 'weird_name_for_ea',
                                            'oa': 'weird_name_for_oa',
                                            'maf': 'weird_name_for_maf',
                                            'b': 'weird_name_for_b',
                                            'se': 'weird_name_for_se',
                                            'p': 'weird_name_for_p',
                                            'hwe': 'weird_name_for_p_hwe',
                                            'info': 'weird_name_for_info',
                                            'n': 'weird_name_for_n',
                                            'eaf': 'weird_name_for_eaf',
                                            'oaf': 'weird_name_for_oaf'})
        ```
        ###### Performing qc
        ```
        s1.qc(maf=.01)
        s2.qc(maf=.01, hwe=1e-6, info=.9)
        s3.qc()  # MAF .01 is the default
        ```
        ###### Merging sumstats, low_memory option is still experimental so be carefull with that
        `merge1 = s1.merge(s2)`
        
        ###### Meta analysis
        ```
        n_weighted_meta = merge1.meta_analyze(name='meta1', method='samplesize')  # N-weighted meta analysis
        ivw_meta = merge1.meta_analyze(name='meta1', method='ivw')  # Standard inverse-variance weighted meta analysis
        gwama = merge1.gwama(name='meta1', method='ivw')  # GWAMA as described in Baselmans, et al. (2019)
        ```
        ###### Additionally supports adding SNP heritabilities as weights
        `exc_meta = exc.gwama(h2_snp={'ntr_exc': .01, 'ukb_ssoe': .02}, name='exc', method='ivw')`
        ###### And your own covariance matrix (called cov_Z in most R scripts)
        ```
        # Either read it from a file:
        import pandas as pd
        cov_z = pd.read_csv('my_cov_z.csv') # Note it should be pandas dataframe with column names and index names equal to your phenotypes
        
        # Or generate it from a phenotype file yourself:
        phenotypes = pd.read_csv('my_phenotype_file.csv')
        cov_z = sumstats.cov_matrix_from_phenotype_file(phenotypes, phenotypes=['GWASsummary1', 'GWASsummary2'])
        
        gwama = exc.gwama(cov_matrix=cov_z, h2_snp={'GWASsummary1': .01, 'GWASsummary2': .02}, name='meta1', method='ivw')
        ```
        ###### See a summary of the result
        `gwama.describe()`
        ###### See head of the data
        `gwama.head()`
        ###### See head of all chromosomes
        `gwama.head(n_chromosomes=23)`
        
        ###### QQ and Manhattan plots of the result
        ```
        gwama.manhattan(filename='meta_manhattan.png')
        gwama.qqplot(filename='meta_qq.png')
        ``` 
        
        ###### Save the result as csv
        `exc.save('exc_sumstats.csv')`
        ###### Save the result as a pickle file (way faster to save and load back into Python)
        `exc.save('exc_sumstats.pickle')`
        
        ###### Merge gwama results with another file:
        `merged = gwama.merge(s3)`
        ###### Save prepped files for MR analysis in R:
        ```
        merged.prep_for_mr(exposure='GWASsummary3', outcome='meta1',
                           filename=['GWAS3-Meta.csv', 'Meta-GWAS3.csv'],
                           p_cutoff=5e-8, bidirectional=True, index=False)
        ```
        The resulting files will have the following column names, per specification of the MendelianRandomization package in R:
        
        `rsid	chr	bp	exposure.A1	exposure.A2	outcome.A1	outcome.A2	exposure.se	exposure.b	outcome.se	outcome.b`
        
        ###### Some other stuff:
        ```
        # See column names of the file
        gpc_neuro.columns
        
        # SumStats support for standard indexing is growing:
        exc[0]  # Get the full output of the first SNP
        exc[:10]  # Get the full output of the first 10 SNPs
        exc[:10, 'p']  # Get the p value of the first 10 SNPs
        exc['p']  # Get the p values of all SNPs
        exc['rs78948828']  # Get the full output of 1 specific rsid
        exc[['rs78948828', 'rs6057089', 'rs55957973']]  # Get the full output of multiple specific rsids
        exc[['rs78948828', 'rs6057089', 'rs55957973'], 'p']  # Get the p-value for specific rsids
        
        # If for whatever reason you want to do stuff with each SNP individually you can also loop over the entire file
        for snp_output in exc:
            if exc['p'] < 5e-8:
                print('Yay significant SNP!')
            # do something
        
        
        # If you only want to loop over some specific columns, you can
        for rsid, b, se, p in exc[['rsid', 'b', 'se', 'p']].values:
            if p < 5e-8:
                print('Yay significant SNP!')
        
        
        ```
        
        
Keywords: gwas summary statistics genetics
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
