MLB 선수들 데이터 스크래핑(Udemy Project 11)

Notice

Recent Posts

Recent Comments

Link

Contact

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터 분석가

MLB 선수들 데이터 스크래핑(Udemy Project 11) 본문

유데미 부트캠프 프로젝트(Final)

MLB 선수들 데이터 스크래핑(Udemy Project 11)

PlintAn 2023. 7. 1. 10:00

안녕하세요!

이번 시간에는 2023년 MLB 메이저리그 야구선수들의 데이터를 MLB 사이트에서 스크래핑하여

가져온 후 CSV 파일로 저장하는 작업을 해보겠습니다 !

먼저, MLB 사이트입니다

https://www.mlb.com/stats/

2023 MLB Player Hitting Stat Leaders

The official source for player hitting stats, MLB home run leaders, batting average, OPS and stat leaders

www.mlb.com

오타니 쇼헤이가 1등이군요 !

하지만 다른 모든 선수들의 데이터를 모두 가져와보겠습니다

import csv
import requests
from bs4 import BeautifulSoup

def scrape_and_save(url, output_file):
    page = 1  # 시작 페이지
    has_next_page = True  # 다음 페이지 존재 여부

    with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['Player', 'Team', 'G', 'AB', 'R', 'H', '2B', '3B', 'HR', 'RBI', 'BB', 'SO', 'SB', 'CS', 'AVG', 'OBP', 'SLG', 'OPS'])  # CSV 파일의 헤더 작성

        while has_next_page:
            response = requests.get(url + f'?page={page}')
            html = response.text
            soup = BeautifulSoup(html, 'html.parser')

            players = soup.select('tbody tr')  # 선수들의 행 선택
            for player in players:
                try:
                    player_name = player.select_one('th a').text.strip()
                    team = player.select_one('td:nth-child(2)').text.strip()
                    g = player.select_one('td:nth-child(3)').text.strip()
                    ab = player.select_one('td:nth-child(4)').text.strip()
                    r = player.select_one('td:nth-child(5)').text.strip()
                    h = player.select_one('td:nth-child(6)').text.strip()
                    double = player.select_one('td:nth-child(7)').text.strip()
                    triple = player.select_one('td:nth-child(8)').text.strip()
                    hr = player.select_one('td:nth-child(9)').text.strip()
                    rbi = player.select_one('td:nth-child(10)').text.strip()
                    bb = player.select_one('td:nth-child(11)').text.strip()
                    so = player.select_one('td:nth-child(12)').text.strip()
                    sb = player.select_one('td:nth-child(13)').text.strip()
                    cs = player.select_one('td:nth-child(14)').text.strip()
                    avg = player.select_one('td:nth-child(15)').text.strip()
                    obp = player.select_one('td:nth-child(16)').text.strip()
                    slg = player.select_one('td:nth-child(17)').text.strip()
                    ops = player.select_one('td:nth-child(18)').text.strip()

                    writer.writerow([player_name, team, g, ab, r, h, double, triple, hr, rbi, bb, so, sb, cs, avg, obp, slg, ops])
                except AttributeError:
                    continue

            next_button = soup.select_one(f'#stats-app-root > section > section > div:nth-child(3) > div:nth-child(2) > div > div > div > div > button')
            if next_button and 'disabled' not in next_button.get('class', []):
                page += 1
            else:
                has_next_page = False

    print("데이터를 성공적으로 스크랩하고 저장했습니다.")

url = 'https://www.mlb.com/stats/'  # 스크랩할 웹 페이지의 URL을 입력하세요.
output_file = ''  # 저장할 CSV 파일 경로와 파일명을 입력하세요.

scrape_and_save(url, output_file)

전체적인 코드는 이렇습니다. Xpath 경로를 이용해 해당 위치의 수치들을 가져오고

이렇게 모든 데이터 수치들을 다 가져왔으면 다음 페이지로 이동해 반복합니다

다음 페이지가 없을 때까지 반복하고 다음 페이지가 없다면 종료합니다.

잘 되네요 ! 타자 수가 155명으로 확인되고 1등은 오타니 쇼헤이네요

제 코드는 페이지당 1초의 간격을 두어 총 6~7초가 소요되었는데요 , 서버에 과부하를 줄 수 있기 때문에

과도한 트래핑은 자제하는게 좋을거 같습니다

'유데미 부트캠프 프로젝트(Final)' 카테고리의 다른 글

주식 데이터 API 서비스 (Udemy project 13) (0)	2023.07.02
우주 외계인 침공 게임(Udemy Project 12) (0)	2023.07.01
이미지 파일 색상 분석 (Udemy Project 10) (0)	2023.06.30
PDF파일 음성 변환 Pdf2Voice (Udemy Project 9) (0)	2023.06.29
파이썬 가장 위험한 글쓰기 (udemy project 8) (0)	2023.06.28

'유데미 부트캠프 프로젝트(Final)' Related Articles

Comments

데이터 분석가

MLB 선수들 데이터 스크래핑(Udemy Project 11) 본문

MLB 선수들 데이터 스크래핑(Udemy Project 11)

'유데미 부트캠프 프로젝트(Final)' 카테고리의 다른 글

티스토리툴바