[2월 4주차-2/26(2)]MySQL 연동하기: Python으로 MySQL 데이터 가져오기+실습

Why Not SW CAMP 5기/수업 기록 2025. 2. 26. 17:18

Python에서 MySQL 데이터를 활용하려면 pymysql 라이브러리를 사용하여 데이터베이스에 연결할 수 있습니다. 이 글에서는 Python과 MySQL을 연동하여 데이터를 가져오는 방법을 정리합니다.

📌 1. 필요한 라이브러리 설치

먼저, pymysql 라이브러리를 설치해야 합니다. 터미널이나 명령 프롬프트에서 다음 명령어를 실행하세요.

pip install pymysql

📌 2. MySQL 데이터베이스 연결

Python 코드에서 MySQL 데이터베이스에 연결하는 방법을 살펴보겠습니다.

import pymysql
import pandas as pd

# MySQL 연결 정보
host = 'localhost'   # MySQL 서버 주소
user = 'root'        # MySQL 사용자 계정
password = '******'    # MySQL 비밀번호
db_name = 'sakila'   # 사용할 데이터베이스명

# MySQL 연결 설정
conn = pymysql.connect(host=host,
                       user=user,
                       password=password,
                       db=db_name,
                       charset='utf8')

위 코드에서는 pymysql.connect()를 사용하여 데이터베이스에 연결합니다.

host: MySQL 서버 주소 (localhost는 로컬 PC에서 실행 중인 MySQL을 의미함)
user: MySQL 접속 계정
password: 해당 계정의 비밀번호
db: 사용할 데이터베이스 이름
charset: utf8을 사용하여 한글 데이터도 처리 가능하도록 설정

📌 3. 데이터 조회하기

MySQL에서 원하는 데이터를 가져오기 위해 SQL 쿼리를 실행합니다.

# 커서 생성 (딕셔너리 형태로 결과 반환)
cursor = conn.cursor(pymysql.cursors.DictCursor)

# 실행할 SQL 쿼리 지정
query = """ SELECT * FROM customer; """

# 쿼리 실행
cursor.execute(query)

# 결과를 모두 가져오기
execute_result = cursor.fetchall()

# 결과를 pandas DataFrame으로 변환
execute_df = pd.DataFrame(execute_result)

# 출력 확인
print(execute_df.head())

cursor.execute(query): SQL 쿼리를 실행합니다.
cursor.fetchall(): 실행 결과를 모두 가져옵니다.
pd.DataFrame(execute_result): 가져온 데이터를 Pandas DataFrame으로 변환하여 분석할 수 있도록 합니다.

📌 4. MySQL 연결 종료

데이터 조회가 끝나면 MySQL 연결을 종료해야 합니다.

conn.close()

이렇게 하면 MySQL 연결이 안전하게 종료됩니다.

📌 5. 실습


import pymysql
from matplotlib import font_manager, rc
import platform
if platform.system() == 'Windows':
    path = 'C:/Windows/Fonts/malgun.ttf'  # Windows: 맑은 고딕 폰트 경로
    font_name = font_manager.FontProperties(fname=path).get_name()
    rc('font', family=font_name)
from matplotlib import font_manager, rc
import platform
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


# connect python to mysql
host = 'localhost'
user = 'root'
password = '******'
db_name = 'sakila'

# pymysql.connect()
conn = pymysql.connect(host=host,
                       user=user,
                       password=password,
                       db=db_name,
                       charset='utf8')

# cursor 생성 : query를 실행하는 excute()
cursor = conn.cursor(pymysql.cursors.DictCursor)

query2 = """
SELECT f.film_id, c.name as category_name, ac.first_name, ac.last_name
FROM rental r
JOIN inventory i ON r.inventory_id = i.inventory_id
JOIN film f ON i.film_id = f.film_id
JOIN film_category fc ON f.film_id = fc.film_id
JOIN film_actor fa ON f.film_id = fa.film_id
JOIN actor ac ON ac.actor_id = fa.actor_id
JOIN category c ON fc.category_id = c.category_id
JOIN customer cu ON r.customer_id = cu.customer_id
JOIN address a ON cu.address_id = a.address_id
JOIN city ci ON a.city_id = ci.city_id
JOIN country co ON ci.country_id = co.country_id
WHERE co.country = 'United States';
"""

cursor.execute(query2)
actor_result = cursor.fetchall()
actor_df = pd.DataFrame(actor_result)

# db 연결 종료
conn.close()

##### 어떤 배우를 써야 가장 인기가 많은지 시각화 하기 ####
# full_name 칼럼 생성
actor_df["full_name"] = actor_df["first_name"] + " " + actor_df["last_name"]

# top5 카테고리 중 가장 많이 출연한 배우 상위 10명
target = ["Documentary", "Sports", "Drama", "Animation", "Family"]
filtered_df = actor_df[actor_df["category_name"].isin(target)]
top_10 = filtered_df["full_name"].value_counts().nlargest(10)

plt.figure(figsize=(10, 5))
sns.barplot(x=top_10.values, y=top_10.index, palette="viridis")
plt.xlabel("Count")
plt.ylabel("Full Name")
plt.title("Top 10 Most Frequent Full Names in Categories top5")
plt.show()

# top5 카테고리 별 출연한 배우 상위 5명 씩
category_top_5 = (
    filtered_df.groupby("category_name")["full_name"]
    .value_counts()
    .groupby(level=0, group_keys=False)  # category_name 그룹 유지
    .nlargest(5)  # 각 그룹에서 상위 5명 선택
    .reset_index(name="count")  # 컬럼 이름 설정
)

plt.figure(figsize=(12, 6))
sns.barplot(
    data=category_top_5,
    x="full_name",
    y="count",
    hue="category_name",
    dodge=False  # 카테고리별로 같은 축에서 보기 쉽게 설정
)

plt.xticks(rotation=45, ha="right")
plt.xlabel("Actor Name")
plt.ylabel("Movie Count")
plt.title("Top 5 Actors by Category")
plt.legend(title="Category", bbox_to_anchor=(1.05, 1), loc="upper left")  # 범례 위치 조정
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()

이제 MySQL과 Python을 연동하여 데이터를 조회하고 분석하는 데 활용해보세요! 🚀

'Why Not SW CAMP 5기 > 수업 기록' 카테고리의 다른 글

[2월 4주차-2/27(2)]📌2024년 국내 항공 노선 이용률 분석 🛫 (1)	2025.02.27
[2월 4주차-2/27(1)]다양한 SQL 함수 정리 (0)	2025.02.27
[2월 4주차-2/26(1)]SQL JOIN과 서브쿼리 정리 (0)	2025.02.26
[2월 4주차-2/25(2)]🎓학사 관리 테이블 실습 - SQL 문제 풀이 (0)	2025.02.25
[2월 4주차-2/25(1)]MySQL 데이터 그룹화, 테이블 조작, 외래키 설정 및 데이터 유형 정리 (0)	2025.02.25

ABOUT ME

rubii rubii

📌 1. 필요한 라이브러리 설치

📌 2. MySQL 데이터베이스 연결

📌 3. 데이터 조회하기

📌 4. MySQL 연결 종료

📌 5. 실습

'Why Not SW CAMP 5기 > 수업 기록' 카테고리의 다른 글

티스토리툴바

ABOUT ME

📌 1. 필요한 라이브러리 설치

📌 2. MySQL 데이터베이스 연결

📌 3. 데이터 조회하기

📌 4. MySQL 연결 종료

📌 5. 실습

'Why Not SW CAMP 5기 > 수업 기록' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바