[2월 3주차-2/18(2)]MelOn, Bugs, Genie 크롤링 후 Excel 파일로 저장 및 통합 🎵

Why Not SW CAMP 5기/수업 기록

[2월 3주차-2/18(2)]MelOn, Bugs, Genie 크롤링 후 Excel 파일로 저장 및 통합 🎵

rubii 2025. 2. 18. 13:48

웹 크롤링을 활용하여 MelOn, Bugs, Genie의 실시간 음악 차트 데이터를 수집하고, 이를 Excel 파일로 저장한 후 하나의 파일로 통합하는 방법을 소개한다.

1️⃣ MelOn 차트 크롤링 🎶

MelOn의 실시간 차트 데이터를 크롤링하여 엑셀 파일로 저장한다.

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

# 크롬 브라우저 실행
driver = webdriver.Chrome()
url = 'http://www.melon.com/chart/index.htm'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

# 데이터 저장 리스트 생성
song_data = []
rank = 1

# 노래 정보 가져오기
songs = soup.select('table > tbody > tr')
for song in songs:
    title = song.select('div.rank01 > span > a')[0].text
    singer = song.select('div.rank02 > span > a')[0].text
    song_data.append(['Melon', rank, title, singer])
    rank += 1

# DataFrame 생성 및 저장
columns = ['서비스', '순위', '타이틀', '가수']
pd_data = pd.DataFrame(song_data, columns=columns)
pd_data.to_excel('./files/melon.xlsx', index=False)

2️⃣ Bugs 차트 크롤링 🎵

Bugs의 실시간 차트 데이터를 크롤링하여 엑셀 파일로 저장한다.

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

# 크롬 브라우저 실행
driver = webdriver.Chrome()
url = 'http://music.bugs.co.kr/chart'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

# 데이터 저장 리스트 생성
song_data = []
rank = 1

# 노래 정보 가져오기
songs = soup.select('table.byChart > tbody > tr')
for song in songs:
    title = song.select('p.title > a')[0].text
    singer = song.select('p.artist > a')[0].text
    song_data.append(['Bugs', rank, title, singer])
    rank += 1

# DataFrame 생성 및 저장
columns = ['서비스', '순위', '타이틀', '가수']
pd_data = pd.DataFrame(song_data, columns=columns)
pd_data.to_excel('./files/bugs.xlsx', index=False)

3️⃣ Genie 차트 크롤링 🎼

Genie의 실시간 차트 데이터를 크롤링하여 엑셀 파일로 저장한다.

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

# 크롬 브라우저 실행
driver = webdriver.Chrome()

# Genie 차트 URL (1~50위 & 51~100위 분리)
url1 = 'http://www.genie.co.kr/chart/top200'
driver.get(url1)
html1 = driver.page_source
soup1 = BeautifulSoup(html1, 'html.parser')

url2 = 'https://www.genie.co.kr/chart/top200?ditc=D&ymd=20250218&hh=11&rtm=Y&pg=2'
driver.get(url2)
html2 = driver.page_source
soup2 = BeautifulSoup(html2, 'html.parser')

# 데이터 저장 리스트 생성
song_data = []
rank = 1

# 1~50위 크롤링
songs1 = soup1.select('table > tbody> tr')
for song in songs1:
    title = song.select('td.info > a.title')[0].text.strip()
    singer = song.select('td.info > a.artist')[0].text
    song_data.append(['Genie', rank, title, singer])
    rank += 1

# 51~100위 크롤링
songs2 = soup2.select('table > tbody> tr')
for song in songs2:
    title = song.select('td.info > a.title')[0].text.strip()
    singer = song.select('td.info > a.artist')[0].text
    song_data.append(['Genie', rank, title, singer])
    rank += 1

# DataFrame 생성 및 저장
columns = ['서비스', '순위', '타이틀', '가수']
pd_data = pd.DataFrame(song_data, columns=columns)
pd_data.to_excel('./files/genie.xlsx', index=False)

4️⃣ 크롤링한 엑셀 파일 통합 📊

MelOn, Bugs, Genie에서 가져온 데이터를 하나의 Excel 파일로 합친다.

import pandas as pd

# 크롤링한 엑셀 파일 리스트
excel_names = ['./files/melon.xlsx', './files/bugs.xlsx', './files/genie.xlsx']

# 데이터프레임 초기화
appended_data = pd.DataFrame()

# 각 파일을 읽어와 하나의 데이터프레임으로 합치기
for name in excel_names:
    pd_data = pd.read_excel(name)
    appended_data = pd.concat([appended_data, pd_data], ignore_index=True)

# 샘플 출력
appended_data.sample(5)
'''
       서비스  순위          타이틀          가수
3    Melon   4     Whiplash       aespa
274  Genie  75  미안해 미워해 사랑해       Crush
66   Melon  67     미치게 그리워서         황가람
272  Genie  73   earthquake  지수 (JISOO)
101   Bugs   2     ATTITUDE   IVE (아이브)
'''

# 통합 데이터 엑셀 저장
appended_data.to_excel('./files/total.xlsx', index=False)

🎯 정리 📌

✅ MelOn, Bugs, Genie의 실시간 차트 데이터를 크롤링
✅ 각 데이터를 DataFrame으로 변환하여 Excel로 저장
✅ 저장된 개별 Excel 파일을 하나로 합쳐 최종 파일 생성

이제 한 번에 음악 차트 데이터를 분석하고 활용할 수 있다! 🚀