Snakemake

2025-11-03

rule clean_reads:
	input: 
		"data/raw_reads/Sample1.fastaq.gz"
	output:
		"data/clean_reads/Sample1_clean.fastq.gz" #snakemake made this new folder by itself
	conda: #can use conda envs to run the tool 
		"Users/ginnyli/python3/envs/snakemake" #conda env here
	shell:
		fastp -i {input file} -o {output}

Snakemake has a rule checker. Will essentially make a dummy command/dry run to see if there is going to be an appropriate output

snakemake -np [desired output]
snakemake -np data/clean_reads/Sample1_clean.fastq.gz

snakemake -np data/clean_reads/{Sample1, Sample2, Sample3}_clean.fastq.gz #shows that wildcards can be used as a list

File statistics, just add another rule and chain them together to run a seqkit command

rule clean_reads:
	input: 
		"data/raw_reads/{sample}.fastaq.gz"
	output:
		"data/clean_reads/{sample}_clean.fastq.gz" #snakemake made this new folder by itself
	conda: #can use conda envs to run the tool 
		"Users/ginnyli/python3/envs/snakemake" #conda env here
	shell:
		fastp -i {input file} -o {output}

rule fastqstats
	input:
		"data/clean_reads/{sample}_clean.fastq.gz"
	output: 
		"data/stats/{sample}_clean.txt"
	conda:
		"Users/ginnyli/python3/envs/snakemake"
	shell:
		"seqkit stats {input} > {output}"
		
rule taxonomy
	input: 
		"data/stats/{sample}_clean.txt"
	output: 
		"data/stats/{sample}_taxonomy.txt"
	conda:
		"Users/ginnyli/python3/envs/snakemake"
	shell: 
		"whatever the script is"
	
rule ncbi
	input: 
		"data/stats/{sample}_taxonomy.txt"
	output: 
		"data/stats/{sample}_graphical.txt"
	conda: 
		"Users/ginnyli/python3/envs/snakemake"
	shell:
		"whatever the script is"
		
rule all #run all the rules together 

Can ask to get a graphic visualization of the steps

snakemake [command request here] --dag | dot -tsvg > dag.svg

snakemake goes one by one, it will stop at that step and let you know.

BiologyMsc