[Python] 트럼프 대통령 트윗으로 시작하는 데이터 처리

티스토리 뷰

개발공부/🎅 Python

[Python] 트럼프 대통령 트윗으로 시작하는 데이터 처리

2022. 9. 20. 21:54

1️⃣ 리스트 순회하기

필요한 이론

1) for 반복문

2) 인덱싱

3) 문자열 인덱싱

▶ 트윗을 정제하기 위해 리스트에 담긴 요소를 하나씩 가져오기

trump_tweets = [
    'Will be leaving Florida for Washington (D.C.) today at 4:00 P.M. Much work to be done, but it will be a great New Year!',
    'Companies are giving big bonuses to their workers because of the Tax Cut Bill. Really great!',
    'MAKE AMERICA GREAT AGAIN!'
]

def date_tweet(tweet):
    for index in range(len(tweet)):
        print('2017년 1월 ' + str(index+1) + '일: ' + tweet[index])

date_tweet(trump_tweets)

▶ 문자열로 이루어진 text 리스트에서 k로 시작하는 문자열을 모두 출력

trump_tweets = ['thank', 'you', 'to', 'president', 'moon', 'of', 'south', 'korea', 'for', 'the', 'beautiful', 'welcoming', 'ceremony', 'it', 'will', 'always', 'be', 'remembered']

def print_k(text):
    for word in text:
        if word[0] == 'k':
            print(word)

print_k(trump_tweets)

2️⃣ 문자열 관련 함수

필요한 이론

1) startswith()

2) split()

3) append() - 리스트 관련 함수

4) upper(), lower()

5) replace()

▶ 해시태그와 멘션을 찾기 위서는 문자열이 # 또는 @로 시작하는지 확인하기

trump_tweets = ['thank', 'you', 'to', 'president', 'moon', 'of', 'south', 'korea', 'for', 'the', 'beautiful', 'welcoming', 'ceremony', 'it', 'will', 'always', 'be', 'remembered']

def print_k(tweet):
    for word in tweet:
        if word.startswith('k'):
            print(word)
    
print_k(trump_tweets)

▶ 트윗에 사용된 단어를 하나씩 살펴보기 위해 공백을 기준으로 분리하고 문자열을 리스트로 변환

trump_tweets = "thank you to president moon of south korea for the beautiful welcoming ceremony it will always be remembered"

def break_into_words(text):
    words = text.split()
    
    return words

print(break_into_words(trump_tweets))

💡 대표적인 공백 문자

" " : 빈칸 (스페이스바)

"\t" : Tab (Tab키)

"\n" : Newline (엔터키)

💡 split() 과 split(' ')의 차이

numbers = '   1  2  3   '
print(numbers.split()) 
>>> ['1', '2', '3']

print(numbers.split(' '))
>>> ['', '', '1', '', '2', '', '3', '', '']

▶ 리스트에서 b로 시작하는 요소를 빈 리스트 new_list에 저장

trump_tweets = ['america', 'is', 'back', 'and', 'we', 'are', 'coming', 'back', 'bigger', 'and', 'better', 'and', 'stronger', 'than', 'ever', 'before']

def make_new_list(text):
    new_list = []
    
    for word in text:
        if word.startswith('b'):
            new_list.append(word)
    
    return new_list

new_list = make_new_list(trump_tweets)
print(new_list)

▶ 대소문자가 다른 FAKE NEWS, Fake News가 몇 번 사용되었는지 정확하게 확인하기 위해 모두 소문자로 변환

trump_tweets = [
    "FAKE NEWS - A TOTAL POLITICAL WITCH HUNT!",
    "Any negative polls are fake news, just like the CNN, ABC, NBC polls in the election.",
    "The Fake News media is officially out of control.",
]
 
def lowercase_all_characters(text):
    processed_text = []
    
    for sentence in text:
        processed_text.append(sentence.lower())
    
    return processed_text

print('\n'.join(lowercase_all_characters(trump_tweets)))

▶ 소문자로 변환된 trump_tweets의 트윗을 공백을 기준으로 구분할 경우 christmas', christmas,, christmas!!!가 생성되기 때문에 christmas가 몇 번 사용되었는지 정확하게 확인하기 위해 특수문자 제거

trump_tweets = [
    "i hope everyone is having a great christmas, then tomorrow it’s back to work in order to make america great again.",
    "7 of 10 americans prefer 'merry christmas' over 'happy holidays'.",
    "merry christmas!!!",
]

def remove_special_characters(text):
    processed_text = []

    for sentence in text:
        sentence = sentence.replace(",", "").replace("'", "").replace("!", "")
        processed_text.append(sentence)
      
    return processed_text


print('\n'.join(remove_special_characters(trump_tweets)))

이 글은 엘리스의 AI트랙 5기 강의를 들으며 정리한 내용입니다.

'개발공부 > 🎅 Python' 카테고리의 다른 글

[Python] Pandas 기본 알아보기 (0)	2022.09.27
[Python] NumPy 사용해보기 (0)	2022.09.24
[Python] TED 강연을 통해 접해 보는 복잡한 형태의 데이터 (0)	2022.09.23
[Python] 넷플릭스 시청 데이터로 알아 보는 데이터형 변환 (0)	2022.09.23
[Python] 영어 단어 모음으로 시작하는 데이터 시각화 (0)	2022.09.21

개발자 삐롱히

프론트엔드 개발자 삐롱히의 개발 & 공부 기록 블로그

개발자 삐롱히

티스토리 뷰