I need help with a small project involving string handling. I have data (protein sequence) in a plain text file like this example. A, B, C and D are names. The name list could be very long (A - Z, ...) and each name is unique and could contain mulitple letters. Each name is followed by space(s) and a string. My objective is to join the strings with the same name together, resulting in a single new line for each name. Take A for example, the result should be: A kldie1243kfhj9jjfklkjhgsd. After that, I need exam if all lines (A, B, C and D) reach the same target length. If not, an * and certain numbe of ? (question mark) are added to the end of the line.
Thanks for your attention.
A kldie1243
B jgkfdk3k09d
C khgopoeoprd
D ljkgho0ja8
A kfhj9jjfk
B lkgkgd947dfs
C okvhlasdf9kl
D lakdfadkl
A lkjhgsd
B lkjlkdsf
C asdfgf
D asdfhetrhsr