これは自然言語処理100本ノック2020に挑んだ、とある人間の記録。
他の問題はこちら → 自然言語処理100本ノック2020から逃げるな まとめ
17. 1列目の文字列の異なり
1列目の文字列の種類(異なる文字列の集合)を求めよ.確認にはcut, sort, uniqコマンドを用いよ.
Pythonコード
import sys path = sys.argv[1] with open(path) as f: s = set() for l in f: s.add(l.split('\t')[0]) for e in s: print(e)
確認用UNIXコマンド
私の環境は少し特殊(fish)なので、もしかしたら動かないかも。
python NPL100_17.py ../popular-names.txt | sort > output_python.txt cut -f 1 ../popular-names.txt | sort | uniq > output_unix.txt diff -s output_python.txt output_unix.txt
実行結果
Patricia Harper Emily Brittany Steven Judith William Jessica Kathleen Robert Lucas Karen Charlotte Kelly Ethan Frank John Chloe Jayden Helen Dorothy Lori Noah Elijah Julie Anthony Jacob Rebecca Shirley Angela Andrew Ava Thomas Alice Laura Edward Ethel Sophia Benjamin Nicole Susan Nicholas Pamela Austin Sharon Abigail Mildred Frances Evelyn Cynthia Alexander Crystal Alexis Charles Megan Harry Tammy Nancy Lauren Betty Aiden Hannah Sarah Amelia David Carol Samantha Brandon Kimberly Ashley Minnie Richard Barbara Amanda Christopher Annie Virginia Liam Matthew Carolyn Bessie Joshua Marie Michael Justin Michelle Donald Brian Ida Scott Mason Anna Emma Taylor Lisa Lillian Olivia Deborah Melissa Ruth Madison Elizabeth Joseph Margaret Mark Donna Joan Tyler Larry Logan Jeffrey Clara Oliver Debra Mary Linda Daniel Gary Isabella Mia Heather Bertha Sandra Doris Florence Walter Jason Henry Tracy George Stephanie Ronald Jennifer Rachel Amy James