README.md 4.01 KB
Newer Older
Gökçe Aydos's avatar
Gökçe Aydos committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 2021-06-18

STAR resource utilization using `/usr/bin/time -v ...` for indexing and mapping a human genome sample:

```
Command being timed: "make mapping"
        User time (seconds): 95696.46
        System time (seconds): 186.39
        Percent of CPU this job got: 2717%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 58:47.95
        ...
        Maximum resident set size (kbytes): 8569896
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 105
        Minor (reclaiming a frame) page faults: 41258618
        Voluntary context switches: 105982
        Involuntary context switches: 170863
        Swaps: 0
        File system inputs: 61438504
        File system outputs: 101940632
        ...
        Page size (bytes): 4096
```

- resident set size (RSS): portion of process memory held in RAM but not swap
- [page fault](https://en.wikipedia.org/wiki/Page_fault)
  - minor: data was loaded already by another process, but needs to be mapped by the MMU to the process'es memory range
  - major: data loaded to RAM
  - (invalid): cannot be read (segmentation fault), e.g., reading a null pointer
- context switches
  - voluntary: process has nothing to do, so CPU can switch to another process
  - involuntary: CPU switches to another process (enforces) even process can continue processing
  - source: https://unix.stackexchange.com/questions/442969/what-exactly-are-voluntary-context-switches
- file system inputs/outputs: the total number of bytes read/written / 512
  - source: https://stackoverflow.com/questions/42126913/what-does-file-system-outputs-mean-with-time-v

Gökçe Aydos's avatar
Gökçe Aydos committed
37
38
# 2021-06-16

Gökçe Aydos's avatar
Gökçe Aydos committed
39
- using Kubernetes one can start different Jupyter profiles reserving different resources
Gökçe Aydos's avatar
Gökçe Aydos committed
40
41
- some programs may require more than 1024 open files (e.g., STAR)
  - configurable through https://wiki.archlinux.org/title/Limits.conf
Gökçe Aydos's avatar
Gökçe Aydos committed
42
- slurm srun funktioniert
Gökçe Aydos's avatar
Gökçe Aydos committed
43
- c0 32 CPUs, c1 48
Gökçe Aydos's avatar
Gökçe Aydos committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
- FASTQ samples available on https://www.internationalgenome.org/data-portal/sample
- Ideen
  - sinnvolle Slurm Installation
  - Wie die Partitionierung
  - NFS
  - Nextcloud-Anbindung

# 2021-06-11

- Blast finds regions of local similarity https://github.com/ncbi/blast_plus_docs
- STAR is RNA seq aligner tutorial: https://github.com/hbctraining/Intro-to-rnaseq-hpc-salmon-flipped/blob/main/lessons/2day_rnaseq_workflow.md
- Human genome reference: https://hbctraining.github.io/Accessing_public_genomic_data/lessons/accessing_genome_reference_data.html
  - http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.1.fa.gz
  - annotated genes http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz
  - 104 is the current release
  - created makefile
- kubernetes archlinux https://dnaeon.github.io/install-and-configure-k8s-on-arch-linux/

# 2021-06-02

- STAR supports only OpenMP but not OpenMPI
- Unix as host, because clustering using the kernel is easier
- Server Modell: S2600CP

## Goal

- specs for hardware

## use case

priority:

- the students should be able to reserve nodes and run STAR by themselves
- students can use shell

else:
- 3DSMAx for animation rendering (Windows support)

## requirements

Gökçe Aydos's avatar
Gökçe Aydos committed
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

- mindestens 512G RAM und genug CPUs (>16) möglich für ein Programm
- Tools
  - Möglichkeit, um Virtualbox VMs zu starten oder
  - die nötigen SW-Tools sind schon verfügbar
- ein Share für oft benötigte Daten wie z.B. Gensequenzen
- Jupyterhub
- Remote Desktop, um die Daten später remote analysieren zu können
- benutzerfreundliche Compute-Ressourcen-Reservierungsmöglichkeit für Studenten (Beispiel mit slurm (srun) [1])
- automatische Anbindung an Nextcloud@th-deg
- Netzwerkspeicher (NFS) für Benutzer
- backup
- Anbindung an die bestehende HPC Infrastruktur[2]

[1] https://hbctraining.github.io/Accessing_public_genomic_data/lessons/accessing_genome_reference_data.html
[2] https://intranet.th-deg.de/en/rz/hpc-cluster

Gökçe Aydos's avatar
Gökçe Aydos committed
101
102
103
104
105
106
there are two tracks for users
- batch queue
- interactive
  - supports Xpra for remote desktop

Slurm has already the ability to differentiate between these tracks