启动 Notebook
1、打开终端(Terminal)
2、进入你想存文章的文件夹,例如:
cd ~/Documents/article_project
3、启动 Jupyter
jupyter notebook
会自动打开浏览器,显示文件列表
点击 New → Python 3 新建 Notebook
Excel相关操作
读取
1 2 3 4 5 6
| import pandas as pd
df = pd.read_excel("your_file.xlsx") df.head()
|
数据
单条件筛选
1 2 3
| df_beijing = df[df["地区"] == "北京"] df_beijing.head()
|
多条件筛选
1 2 3
| df_filtered = df[(df["地区"] == "北京") & (df["销售额"] > 1000)] df_filtered.head()
|
筛选包含指定值
1 2 3 4
| df_tech = df[df["客户名称"].str.contains("科技", case=False)] df_tech.head()
|
去重
去重(默认按所有列)
1 2 3
| df_unique = df.drop_duplicates() df_unique.head()
|
去重(指定列)
1 2 3
| df_unique = df.drop_duplicates(subset=["客户ID"], keep="first") df_unique.head()
|
去除空值
1 2 3
| df_clean = df.dropna() df_clean.head()
|
去除特定列的空值
1 2
| df_no_na_col = df.dropna(subset=["销售额"]) df_no_na_col.head()
|
填充空值
1 2 3
| df_filled = df.fillna({"销售额": 0}) df_filled.head()
|
排序
1 2 3 4 5
| df.sort_values(by="销售额", ascending=True, inplace=True)
df.sort_values(by=["地区","销售额"], ascending=[True, False], inplace=True)
|
添加新列
1 2 3 4 5
| df["折后金额"] = df["销售额"] * 0.9
df["高销售"] = df["销售额"].apply(lambda x: "是" if x > 1000 else "否")
|
筛选 Top N / Bottom N
1 2 3 4 5
| df.nlargest(5, "销售额")
df.nsmallest(3, "销售额")
|
实现Vslookup
1 2 3 4 5 6 7 8 9 10 11 12
| df1 = pd.DataFrame({ "客户ID":[1,2,3], "客户名称":["A","B","C"] })
df2 = pd.DataFrame({ "客户ID":[1,2,3], "地区":["北京","上海","广州"] })
|
用 merge 实现 VLOOKUP
1 2
| df_merged = df1.merge(df2, on="客户ID", how="left") df_merged
|
写入
1 2
| df_merged.to_excel("processed.xlsx", index=False)
|
小技巧
链式操作:去空值 + 去重 + 筛选可以一行搞定
1 2 3 4
| df_clean = df.dropna(subset=["销售额"]) \ .drop_duplicates(subset=["客户ID"]) \ .query("销售额 > 1000")
|
快速查看数据类型
1 2 3 4
| df.head() df.tail(10) df.info() df.describe()
|
列选择
Notebook 融合 AI
安装和导入
这里以 OpenAI 免费额度的 GPT-3.5 为例:
pip install openai
1 2
| from openai import OpenAI client = OpenAI(api_key="")
|
提问
1 2 3 4 5 6 7 8 9 10 11 12
| data_text = df.head(20).to_string() prompt = f"你是数据分析师,请分析以下数据:\n{data_text}\n用中文总结趋势、异常和建议。"
response = client.chat.completions.create( model="gpt-4", messages=[{"role":"user", "content": prompt}], temperature=0.7 )
analysis = response.choices[0].message.content print(analysis)
|