[Python] 영어 단어 모음으로 시작하는 데이터 시각화

티스토리 뷰

개발공부/🎅 Python

[Python] 영어 단어 모음으로 시작하는 데이터 시각화

2022. 9. 21. 02:02

1️⃣ 파일 다루기

필요한 이론

1) 파일 열기 / 닫기

# 파일 열기 open()
file = open('파일명')

# 파일 읽기 read()
content = file.read()

# 파일 닫기 close()
file.close()

- 파일의 모드 설정하기

(옵션을 주지 않고 열면 '읽기 모드'로 파일이 열린다.)

# 쓰기 (Write) 모드로 파일을 연다
with open('파일명', 'w') as file: 
    file.write('Hello')

2) 파일 자동으로 닫기

with open('파일명') as file:
    content = file.read()
    
# file.close() 필요 없음

3) 파일 읽기

file.read()

- 줄 단위로 읽기

contents = []

with open('파일명') as file:
    for line in file:
        content.append(line)

▶ 영어 단어 데이터를 분석을 위해 데이터가 저장된 파일을 파이썬으로 읽어오기 (파일의 내용을 각 줄의 번호와 함께 출력)

filename = 'corpus.txt'

def print_lines(filename):
    with open(filename) as file:
        line_number = 1    

        for line in file:
            print(str(line_number) + ' ' + line)
            line_number += 1

print_lines(filename)

2️⃣ 데이터 구조 다루기

필요한 이론

1) 튜플 (Tuple)

hello = ('안녕하세요', 'hello', 'bonjour')
apple = ('사과', 'apple', 'pomme') 
red = ('빨갛다', 'red', 'rouge')

튜플(tuple)과 리스트(list) 비교

공통점

- 순서가 있는 원소들의 집합

차이점

- 각 원소의 값을 수정할 수 없다.

- 원소의 개수를 바꿀 수 없다.

# 리스트
hello = ['안녕하세요', 'hello', 'bonjour']
hello[0] = '안녕'		# ['안녕하세요', 'hello', 'bonjour']
hello.append('ni hao')		# ['안녕하세요', 'hello', 'bonjour', 'ni hao']


# 튜플
hello = ('안녕하세요', 'hello', 'bonjour')
hello[0] = '안녕'		# 에러
hello.append('ni hao')		# 그런 함수 없음

2) 리스트로 리스트 만들기 (List Comprehension)

words = ['life', 'love', 'faith'] 

# 방법1 - for문
first_letters = [] 
for word in words: 
    first_letters.append(word[0])

# 방법2 - List Comprehension
first_letters = [word[0] for word in words]

numbers = [1, 3, 5, 7] 

# 방법1 - for문
new_numbers = [] 
for n in numbers: 
    new_numbers.append(n + 1)

# 방법2 - List Comprehension
new_numbers = [n + 1 for n in numbers]

- 특정원소 걸러내기

numbers = [1, 3, 4, 5, 6, 7] 

# 방법1 - for문
even = [] 
for n in numbers: 
    if n % 2 == 0: 
    even.append(n)

# 방법2 - List Comprehension
even = [n for n in numbers if n % 2 == 0]
odd = [n + 1 for n in numbers if n % 2 == 0]

3) 데이터 정렬하기

# 절대값 순서대로 정렬하기
numbers = [-1, 3, -4, 5, 6, 100] 
sort_by_abs = sorted(numbers, key=abs)

fruits = ['cherry', 'apple', 'banana'] 

# key가 없으면 기본순서(알파벳순, 숫자작은순)로 정렬
sort_by_alphabet = sorted(fruits)	# ['apple', 'banana', 'cherry']


# 문자의 마지막 글자를 기준으로 정렬
def reverse(word): 
    return str(reversed(word))	# ['yrrehc', 'elppa', 'ananab]
 
sort_by_last = sorted(fruits, key=reverse)	# ['banana', 'apple', 'cherry']

▶ corpus.txt의 내용을 읽고 (단어, 빈도수) 튜플로 구성된 리스트를 리턴하기

filename = 'corpus.txt'

def import_as_tuple(filename):
    tuples = []
    with open(filename) as file:
        for line in file:
            split = line.strip().split(',')
            word = split[0]
            freq = split[1]
            new_tuple = (word, freq)
            tuples.append(new_tuple)
                  
    return tuples

print(import_as_tuple(filename))

▶ 단어 모음 words에서 prefix로 시작하는 단어로만 이루어진 리스트를 리턴하기

words = [
    'apple',
    'banana',
    'alpha',
    'bravo',
    'cherry',
    'charlie',
]

def filter_by_prefix(words, prefix):
    return [word for word in words if word.startswith(prefix)]

a_words = filter_by_prefix(words, 'a')
print(a_words)

▶ 단어의 사용 빈도를 쉽게 확인하기 위해 단어를 빈도 순서대로 정렬하기

pairs = [
    ('time', 8),
    ('the', 15),
    ('turbo', 1),
]
 
def get_freq(pair):
    return pair[1]

def sort_by_frequency(pairs):
    return sorted(pairs, key = get_freq)

print(sort_by_frequency(pairs))

3️⃣ 그래프 다루기

필요한 이론

1) matplotlib

- Mathematical Plot Library

- 파이썬에서 그래프를 그릴 수 있게 하는 라이브러리

- 꺾은선 그래프, 막대 그래프 등을 모두 지원한다.

▶ matplotlib의 bar() 메소드를 이용하여 최근 평균 기온 막대 그래프를 그려보기

# matplotlib의 일부인 pyplot 라이브러리 불러오기
import matplotlib.pyplot as plt

# 엘리스에서 차트를 그릴 때 필요한 라이브러리 불러오기
from elice_utils import EliceUtils
elice_utils = EliceUtils()

# 월별 평균 기온을 선언
years = [2013, 2014, 2015, 2016, 2017]
temperatures = [5, 10, 15, 20, 17]

#막대 차트를 출력
def draw_graph():
    # 막대 그래프의 막대 위치를 결정하는 pos를 선언
    pos = range(len(years))  # [0, 1, 2, 3, 4]
    
    # 높이가 온도인 막대 그래프를 그리기, 각 막대를 가운데 정렬
    plt.bar(pos, temperatures, align='center')
    
    # 각 막대에 해당되는 연도를 표기
    plt.xticks(pos, years)
    
    # 그래프를 엘리스 플랫폼 상에 표시
    plt.savefig('graph.png')
    elice_utils.send_image('graph.png')

print('막대 차트를 출력합니다.')
draw_graph()

이 글은 엘리스의 AI트랙 5기 강의를 들으며 정리한 내용입니다.

'개발공부 > 🎅 Python' 카테고리의 다른 글

[Python] Pandas 기본 알아보기 (0)	2022.09.27
[Python] NumPy 사용해보기 (0)	2022.09.24
[Python] TED 강연을 통해 접해 보는 복잡한 형태의 데이터 (0)	2022.09.23
[Python] 넷플릭스 시청 데이터로 알아 보는 데이터형 변환 (0)	2022.09.23
[Python] 트럼프 대통령 트윗으로 시작하는 데이터 처리 (0)	2022.09.20

개발자 삐롱히

프론트엔드 개발자 삐롱히의 개발 & 공부 기록 블로그

개발자 삐롱히

티스토리 뷰